Interpreting Decision Process in Offline Reinforcement Learning for Interactive Recommendation Systems


Panov A. Kuderov P.


Recommendation systems, which predict relevant and appealing items for users on web platforms, often rely on static user interests, resulting in limited interactivity and adaptability. Reinforcement Learning (RL), while providing a dynamic and adaptive approach, brings its unique challenges in this context. Interpreting the behavior of an RL agent within recommendation systems is complex due to factors such as the vast and continuously evolving state and action spaces, non-stationary user preferences, and implicit, delayed rewards often associated with long-term user satisfaction. Addressing the inherent complexities of applying RL in recommendation systems, we propose a framework that includes innovative metrics and a synthetic environment. The metrics aim to assess the real-time adaptability of an RL agent to dynamic user preferences. We apply this framework to LastFM datasets to interpret metric outcomes and test hypotheses regarding MDP setups and algorithm choices by adjusting dataset parameters within the synthetic environment. This approach illustrates potential applications of our framework, while highlighting the necessity for further research in this area.

External links

DOI: 10.1007/978-981-99-8138-0_22

Reference link

Volovikova, Z., Kuderov, P., Panov, A. I. (2024). Interpreting Decision Process in Offline Reinforcement Learning for Interactive Recommendation Systems // In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1963. Springer, Singapore. Pp. 270–286.