Vector Semiotic Model for Visual Question Answering

Авторы

Панов А. И. Ковалёв А. К.

Аннотация

In this paper, we propose a Vector Semiotic Model as a possible solution to the symbol grounding problem in the context of Visual Question Answering. The Vector Semiotic Model combines the advantages of a Semiotic Approach implemented in the Sign-Based World Model and Vector Symbolic Architectures. The Sign-Based World Model represents information about a scene depicted on an input image in a structured way and grounds abstract objects in an agent’s sensory input. We use the Vector Symbolic Architecture to represent the elements of the Sign-Based World Model on a computational level. Properties of a high-dimensional space and operations defined for high-dimensional vectors allow encoding the whole scene into a high-dimensional vector with the preservation of the structure. That leads to the ability to apply explainable reasoning to answer an input question. We conducted experiments are on a CLEVR dataset and show results comparable to the state of the art. The proposed combination of approaches, first, leads to the possible solution of the symbol-grounding problem and, second, allows expanding current results to other intelligent tasks (collaborative robotics, embodied intellectual assistance, etc.).

Внешние ссылки

DOI: 10.1016/j.cogsys.2021.09.001

Скачать PDF на сайте ВШЭ (англ.): https://www.hse.ru/data/2022/07/08/1631278420/Kovalev2022_7.pdf

РИНЦ: https://www.elibrary.ru/item.asp?id=47525383

ResearchGate: https://www.researchgate.net/publication/355677933_Vector_Semiotic_Model_for_Visual_Question_Answering

Ссылка при цитировании

Ковалёв А. К., Шабан М., Осипов Е., Панов А. И. Vector Semiotic Model for Visual Question Answering // Cognitive Systems Research, Vol. 71, 2022, pp. 52–63.