Model-based reinforcement learning is a hybrid approach that combines planning with a world model and model-free policy learning, a major advantage of which is the high sample efficiency. The world model approaches apply it either for additional policy training on model-generated trajectories or for obtaining a more accurate approximation of the state-value function that is used to estimate the action values along multistep model trajectories. This paper proposes a new approach to integrating the world model as a critic into architectures of the actor–critic family to estimate the action values. Experiments with hybrid algorithms using a world model with look-ahead tree search as a critic on environments with a complex set of subgoals have shown that the proposed integration can accelerate policy learning under certain conditions.
Panov, A. I., Ugadiarov, L. A. A World Model for Actor–Critic in Reinforcement Learning // Pattern Recognition and Image Analysis, 33, 467–477 (2023).