Most state-of-the-art methods do not explicitly use scene semantics for place recognition by the images. We address this problem and propose a new two-stage approach referred to as TSVLoc. It solves the place recognition task as the image retrieval problem and enriches any well-known method. In the first model-agnostic stage, any modern neural network model that does not directly use semantics, e.g., HF-Net, NetVLAD, or Patch-NetVLAD, can be used. In the second stage, we apply the Vector Symbolic Architectures (VSA) framework to construct semantic scene representation. Our method uses semantic segmentation of an image to extract objects and their relations and applies VSA operations to form semantic scene representation. For this, an optional usage of the depth map was considered, which showed promising results. The effectiveness of our approach is demonstrated through extensive experiments on the open largescale datasets: the indoor HPointLoc dataset built in the Habitat simulation environment and the outdoor Oxford RobotCar dataset. The proposed solution significantly improves the quality of the place recognition.
Скачать PDF с сайта ВШЭ (англ.): https://www.hse.ru/data/2022/07/08/1631278426/Kirilenko2022%20(IEEE%20WCCI%202022)_8.pdf?ysclid=l6m3tgvfyw394000799
Даниил Кириленко, Алексей Ковалёв, Ярослав Соломенцев, Александр Мелехин, Дмитрий Юдин, Александр Панов. Векторно-символьное представление сцены для семантического распознавания места // IEEE WCCI 2022.