Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.
Читать на ScienceDirect (англ): https://www.sciencedirect.com/science/article/abs/pii/S0950705121001076
Скачать PDF или читать онлайн на ResearchGate (англ.): https://www.researchgate.net/publication/349327843_Forgetful_experience_replay_in_hierarchical_reinforcement_learning_from_expert_demonstrations
Скачать исходный код на GitHub: https://github.com/cog-isa/forger
Скрынник А. А., Староверов А. В., Айтыгулов Э. Э., Аксенов К. А., Давыдов В. Д., Панов А. И. Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations // Knowledge-Based Systems, Vol. 218, 106844.