Recently, the pre-quantization of image features into discrete latent variables has helped to achieve remarkable results in image modeling. In this paper, we propose a method to learn discrete latent variables applied to object-centric tasks. We assign objects to slots which are represented as vectors generated by sampling from non-overlapping sets of low-dimensional discrete variables. We empirically demonstrate that embeddings from the learned discrete latent spaces have the disentanglement property. We use set prediction and object discovery as downstream tasks. The model achieves state-of-the-art results on the CLEVR dataset among a class of object-centric methods for the set prediction task. We also demonstrate manipulation of individual objects in a scene with controllable image generation in the object discovery setting.
DOI: 10.1007/978-3-031-45170-6_53
Download PDF at Springer: https://link.springer.com/content/pdf/10.1007/978-3-031-45170-6_53?pdf
Download conference proceedings (PDF) or read online at Springer: https://link.springer.com/chapter/10.1007/978-3-031-45170-6_53
ResearchGate: https://www.researchgate.net/publication/375659164_Quantized_Disentangled_Representations_for_Object-Centric_Visual_Tasks
Kirilenko, D., Korchemnyi, A., Smirnov, K., Kovalev, A. K., Panov, A. I. (2023). Quantized Disentangled Representations for Object-Centric // Visual Tasks. In: Maji, P., Huang, T., Pal, N.R., Chaudhury, S., De, R.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2023. Lecture Notes in Computer Science, vol 14301. Springer, Cham. Pp. 514–522.