Адаптация функции расстояния в целенаправленном обучении с подкреплением, основанном на модели

Авторы

Панов А. И. , Васильев Д. В. , Кудеров П. В.

Аннотация

Goal-conditioned reinforcement learning aims to develop agents capable of reaching any state within a defined environment. Given the diversity of potential goals, reward engineering can become cumbersome. Therefore, designing algorithms that can train without external rewards is beneficial. This approach is formalized as unsupervised goal-conditioned reinforcement learning (UGCRL), wherein the goal space is a subset of the environmental states. To achieve this objective, it is necessary to engineer goal-conditioned rewards. In this work, we analyze goal-conditioned rewards based on distances between states in a model-based setting and examine the behavior of distance functions depending on different representations used to train such distances. We conducted experiments in continuous maze environments. PointMaze environment is a labyrinth with complex topology but simple control, while AntMaze is simple in topology but complex in control. We found that our method showed some improvements in distant goals in PointMaze. In AntMaze, our method demonstrated performance comparable to the baseline.

Внешние ссылки

DOI: 10.1007/978-3-031-73691-9_17

ResearchGate: https://www.researchgate.net/publication/385076182_Dynamical_Distance_Adaptation_in_Goal-Conditioned_Model-Based_Reinforcement_Learning

Смотреть презенатцию Дениса Васильева на канале Центра когнитивного моделирования МФТИ (с 2:12:50):

Ссылка при цитировании

Vasilev, D. V., Latyshev, A., Kuderov, P., Shiman, N., Panov, A. I. (2024). Dynamical Distance Adaptation in Goal-Conditioned Model-Based Reinforcement Learning // In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y., Yudin, D. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VIII. Selected Papers from the XXVI International Conference on Neuroinformatics, October 21–25, 2024, Moscow, Russia. Pages 185–198.