When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Авторы

Панов А. И. , Яковлев К. С. , Скрынник А. А.

Аннотация

Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. Moreover, we assume that the agents do not communicate and do not share knowledge on their goals, intended actions, etc. The task is to construct a policy that maps the agent’s observations to actions. Our contribution is multifold. First, we propose two novel policies for solving partially observable MAPF (PO-MAPF): one based on heuristic search and another one based on reinforcement learning (RL). Next, we introduce a mixed policy that is based on switching between the two. We suggest three different switch scenarios: the heuristic, the deterministic, and the learnable one. A thorough empirical evaluation of all the proposed policies in a variety of setups shows that the mixing policy demonstrates the best performance is able to generalize well to the unseen maps and problem instances, and, additionally, outperforms the state-of-the-art counterparts (Primal2 and PICO). The source-code is available at https://github.com/AIRI-Institute/when-to-switch.

Внешние ссылки

DOI: 10.1109/TNNLS.2023.3303502

Скачать PDF в библиотеке IEEE Xplore (англ., требуется регистрация): https://ieeexplore.ieee.org/document/10236574/

ResearchGate: https://www.researchgate.net/publication/373553891_When_to_Switch_Planning_and_Learning_for_Partially_Observable_Multi-Agent_Pathfinding

Авторы

Аннотация

Внешние ссылки

Ссылка при цитировании