In this work, we explore the application of the Self-other-Modelling algorithm (SOM) to several agent architectures for the collaborative grid-based environment. Asynchronous Advantage Actor-Critic (A3C) algorithm was compared with the OpenAI Hide-and-seek (HNS) agent. We expand their implementation by adding the SOM algorithm. As an extension of the original environment, we add a stochastic initialization version of the environment. To address the lack of performance in such an environment by all versions of agents, we made further improvements over the A3C and HNS agents, adding the module dedicated to the SOM algorithm. This agent was able to efficiently solve a stochastically initialized version of the environment, showing the potential benefits of such an approach.
На сайте BICA*AI 2020: https://bica2020.org/speakers/
Давыдов В., Люсько Т., Панов А. И. Self and Other Modelling in Cooperative Resource Gathering with Multi-Agent Reinforcement Learning // Brain-Inspired Cognitive Architectures for Artificial Intelligence: BICA*AI 2020. — Advances in Intelligent Systems and Computing, Vol.1310. — С. 69–77.