Evaluation of Pretrained Large Language Models in Embodied Planning Tasks


Панов А. И. Ковалёв А. К.


Modern pretrained large language models (LLMs) are increasingly being used in zero-shot or few-shot learning modes. Recent years have seen increased interest in applying such models to embodied artificial intelligence and robotics tasks. When given in a natural language, the agent needs to build a plan based on this prompt. The best solutions use LLMs through APIs or models that are not publicly available, making it difficult to reproduce the results. In this paper, we use publicly available LLMs to build a plan for an embodied agent and evaluate them in three modes of operation: 1) the subtask evaluation mode, 2) the full autoregressive plan generation, and 3) the step-by-step autoregressive plan generation. We used two prompt settings: prompt-containing examples of one given task and a mixed prompt with examples of different tasks. Through extensive experiments, we have shown that the subtask evaluation mode, in most cases, outperforms others with a task-specific prompt, whereas the step-by-step autoregressive plan generation posts better performance in the mixed prompt setting.

Внешние ссылки

DOI: 10.1007/978-3-031-33469-6_23

ResearchGate: https://www.researchgate.net/publication/370993484_Evaluation_of_Pretrained_Large_Language_Models_in_Embodied_Planning_Tasks

Ссылка при цитировании

Sarkisyan, C., Korchemnyi, A., Kovalev, A. K., Panov, A. I. (2023). Evaluation of Pretrained Large Language Models in Embodied Planning Tasks // In: Hammer, P., Alirezaie, M., Strannegård, C. (eds) Artificial General Intelligence. AGI 2023. Lecture Notes in Computer Science(), vol 13921. Springer, Cham.