Exploring Ensemble Error Exploration for Unsupervised Reinforcement Learning

Авторы

Панов А. И. Кудеров П. В.

Аннотация

Environment exploration is essential in reinforcement learning (RL) and especially in unsupervised reinforcement learning (URL). State of the art method for exploration trains an ensemble of forward dynamics prediction networks and uses disagreement (variance) of the predictions as intrinsic reward. This method is also used for exploration in state of the art URL algorithm. In this work we explore ensemble disagreement performance on various environments in URL setup on top of different world model architectures, specifically gaussian or categorical world model latent state. We find out that despite used with categorical state, it performs worse than gaussian in most environments and totally fails in some. Additionally we propose a novel, more time and space efficient, method for intrinsic reward computation which is competitive to baselines.

Внешние ссылки

DOI: 10.1007/978-3-031-73691-9_18

ResearchGate: https://www.researchgate.net/publication/385077324_Exploring_Ensemble_Error_Exploration_for_Unsupervised_Reinforcement_Learning

Ссылка при цитировании

Shiman, N., Latyshev, A., Kuderov, P., Panov, A. (2024). Exploring Ensemble Error Exploration for Unsupervised Reinforcement Learning // In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y., Yudin, D. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VIII. Selected Papers from the XXVI International Conference on Neuroinformatics, October 21–25, 2024, Moscow, Russia. Pages 199–209.