Quantized Disentangled Representations for Object-Centric Visual Tasks


Panov A. Kovalyov A. Kirilenko D. Korchemnyi A.


Recently, the pre-quantization of image features into discrete latent variables has helped to achieve remarkable results in image modeling. In this paper, we propose a method to learn discrete latent variables applied to object-centric tasks. We assign objects to slots which are represented as vectors generated by sampling from non-overlapping sets of low-dimensional discrete variables. We empirically demonstrate that embeddings from the learned discrete latent spaces have the disentanglement property. We use set prediction and object discovery as downstream tasks. The model achieves state-of-the-art results on the CLEVR dataset among a class of object-centric methods for the set prediction task. We also demonstrate manipulation of individual objects in a scene with controllable image generation in the object discovery setting.

External links

DOI: 10.1007/978-3-031-45170-6_53

Download PDF at Springer: https://link.springer.com/content/pdf/10.1007/978-3-031-45170-6_53?pdf

Download conference proceedings (PDF) or read online at Springer: https://link.springer.com/chapter/10.1007/978-3-031-45170-6_53

ResearchGate: https://www.researchgate.net/publication/375659164_Quantized_Disentangled_Representations_for_Object-Centric_Visual_Tasks

Reference link

Kirilenko, D., Korchemnyi, A., Smirnov, K., Kovalev, A. K., Panov, A. I. (2023). Quantized Disentangled Representations for Object-Centric // Visual Tasks. In: Maji, P., Huang, T., Pal, N.R., Chaudhury, S., De, R.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2023. Lecture Notes in Computer Science, vol 14301. Springer, Cham. Pp. 514–522.