In the paper, we consider the task of Visual Question Answering, an important task for creating General Artificial Intelligence (AI) systems. We propose an interpretable model called GS-VQA. The main idea behind it is that a complex compositional question could be decomposed into a sequence of simple questions about objects’ properties and their relations. We use the Unified estimator to answer questions from that sequence and test the proposed model on CLEVR and THOR-VQA datasets. The GS-VQA model demonstrates results comparable to the state of the art while maintaining transparency and interpretability of the response generation process.
DOI: 10.1007/978-3-031-19907-3_9
Читать (PDF) или смотреть презентацию (Google Drive) на сайте AGI 2022 (англ.): https://agi-conf.org/2022/accepted-posters/
ResearchGate: https://www.researchgate.net/publication/367095018_Graph_Strategy_for_Interpretable_Visual_Question_Answering
Sarkisyan, C., Savelov, M., Kovalev, A. K., Panov, A. I. (2023). Graph Strategy for Interpretable Visual Question Answering // Artificial General Intelligence. AGI 2022. Lecture Notes in Computer Science, vol 13539, pp. 86–99. https://doi.org/10.1007/978-3-031-19907-3_9