Most state-of-the-art methods do not explicitly use scene semantics for place recognition by the images. We address this problem and propose a new two-stage approach referred to as TSVLoc. It solves the place recognition task as the image retrieval problem and enriches any well-known method. In the first model-agnostic stage, any modern neural network model that does not directly use semantics, e.g., HF-Net, NetVLAD, or Patch-NetVLAD, can be used. In the second stage, we apply the Vector Symbolic Architectures (VSA) framework to construct semantic scene representation. Our method uses semantic segmentation of an image to extract objects and their relations and applies VSA operations to form semantic scene representation. For this, an optional usage of the depth map was considered, which showed promising results. The effectiveness of our approach is demonstrated through extensive experiments on the open largescale datasets: the indoor HPointLoc dataset built in the Habitat simulation environment and the outdoor Oxford RobotCar dataset. The proposed solution significantly improves the quality of the place recognition.
Download PDF from the Higher School of Economics website: https://www.hse.ru/data/2022/07/08/1631278426/Kirilenko2022%20(IEEE%20WCCI%202022)_8.pdf?ysclid=l6m3tgvfyw394000799
Kirilenko D., Kovalev A. K., Solomentsev Y., Melekhin A., Dmitry Yudin D., Panov A. I. Vector Symbolic Scene Representation for Semantic Place Recognition // IEEE WCCI 2022.