Modelbased reinforcement learning is a hybrid approach that combines planning and modelfree learning has a strong advantage of being sample efficient. Modelbased methods simulate trajectories by interacting with the model in order to directly train the policy or to approximate the state value function and use it to estimate stateaction values along the multistep simulated trajectories. In this paper we propose a novel approach to integrate an environment model and actorcritic methods that employs the model as a critic to estimate the stateaction value function. Experiments with hybrid actorcritic algorithms using a lookahead treestructured model as a critic in environments with a complex set of subgoals have shown that the proposed integration can speed up the learning process under certain conditions.
Download the conference proceedings (PDF) at eLibrary (in Russian, registration required): https://www.elibrary.ru/item.asp?id=50346284
Panov A. I., Ugadyarov L. A. World model for actor and critic in reinforcement learning // XXI Russian Conference on Artificial Intelligence, RCAI-2023 (Moscow, 21–23 December 2022). Proceedings in 2 volumes. Vol. 2. Moscow: MPEI Publishing, 2022. Pp. 39–54.