The multi-modal tasks have started to play a significant role in the research on Artificial Intelligence. A particular example of that domain is visual-linguistic tasks, such as Visual Question Answering and its extension, Visual Dialog. In this paper, we concentrate on the Visual Dialog task and dataset. The task involves two agents. The first agent does not see an image and asks questions about the image content. The second agent sees this image and answers questions. The symbol grounding problem, or how symbols obtain their meanings, plays a crucial role in such tasks. We approach that problem from the semiotic point of view and propose the Vector Semiotic Architecture for Visual Dialog. The Vector Semiotic Architecture is a combination of the Sign-Based World Model and Vector Symbolic Architecture. The Sign-Based World Model represents agent knowledge on the high level of abstraction and allows uniform representation of different aspects of knowledge, forming a hierarchical representation of that knowledge in the form of a special kind of semantic network. The Vector Symbolic Architecture represents the computational level and allows to operate with symbols as with numerical vectors using simple element-wise operations. That combination enables grounding object representation from any level of abstraction to the sensory agent input.
DOI: 10.1007/978-3-030-86271-8_21
Download PDF from the Higher School of Economics website: https://www.hse.ru/data/2022/07/08/1631278416/Kovalev2021_ApplyingVectorSymbolicArchitec_5.pdf
Read in the conference proceedings at Google Books: https://books.google.ru/books?id=DQpDEAAAQBAJ&lpg=PA243
Kovalev, A. K., Shaban, M., Chuganskaya, A. A., Panov, A. I. Applying Vector Symbolic Architecture and Semiotic Approach to Visual Dialog // In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science, vol. 12886, pp. 243–255, 2021.