Graph Strategy for Interpretable Visual Question Answering

Авторы

Панов А. И. Ковалёв А. К.

Аннотация

In the paper, we consider the task of Visual Question Answering, an important task for creating General Artificial Intelligence (AI) systems. We propose an interpretable model called GS-VQA. The main idea behind it is that a complex compositional question could be decomposed into a sequence of simple questions about objects’ properties and their relations. We use the Unified estimator to answer questions from that sequence and test the proposed model on CLEVR and THOR-VQA datasets. The GS-VQA model demonstrates results comparable to the state of the art while maintaining transparency and interpretability of the response generation process.

Внешние ссылки

DOI: 10.1007/978-3-031-19907-3_9

Читать (PDF) или смотреть презентацию (Google Drive) на сайте AGI 2022 (англ.): https://agi-conf.org/2022/accepted-posters/

ResearchGate: https://www.researchgate.net/publication/367095018_Graph_Strategy_for_Interpretable_Visual_Question_Answering

Ссылка при цитировании

Sarkisyan, C., Savelov, M., Kovalev, A. K., Panov, A. I. (2023). Graph Strategy for Interpretable Visual Question Answering // Artificial General Intelligence. AGI 2022. Lecture Notes in Computer Science, vol 13539, pp. 86–99. https://doi.org/10.1007/978-3-031-19907-3_9