Quantized Disentangled Representations for Object-Centric Visual Tasks


Панов А. И. Ковалёв А. К. Корчемный А. В. Кириленко Д. Е.


Recently, the pre-quantization of image features into discrete latent variables has helped to achieve remarkable results in image modeling. In this paper, we propose a method to learn discrete latent variables applied to object-centric tasks. We assign objects to slots which are represented as vectors generated by sampling from non-overlapping sets of low-dimensional discrete variables. We empirically demonstrate that embeddings from the learned discrete latent spaces have the disentanglement property. We use set prediction and object discovery as downstream tasks. The model achieves state-of-the-art results on the CLEVR dataset among a class of object-centric methods for the set prediction task. We also demonstrate manipulation of individual objects in a scene with controllable image generation in the object discovery setting.

Внешние ссылки

DOI: 10.1007/978-3-031-45170-6_53

Скачать PDF на сайте издательства Springer (англ.): https://link.springer.com/content/pdf/10.1007/978-3-031-45170-6_53?pdf

Скачать сборник материалов конференции (PDF) или читать онлайн на сайте издательства Springer (англ.): https://link.springer.com/chapter/10.1007/978-3-031-45170-6_53

ResearchGate: https://www.researchgate.net/publication/375659164_Quantized_Disentangled_Representations_for_Object-Centric_Visual_Tasks

Ссылка при цитировании

Kirilenko, D., Korchemnyi, A., Smirnov, K., Kovalev, A. K., Panov, A. I. (2023). Quantized Disentangled Representations for Object-Centric // Visual Tasks. In: Maji, P., Huang, T., Pal, N.R., Chaudhury, S., De, R.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2023. Lecture Notes in Computer Science, vol 14301. Springer, Cham. Pp. 514–522.