Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster's center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means algorithm. Our work employs a learnable clustering method based on the Gaussian Mixture Model. Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors, leading to more expressive slot representations. Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios, achieving state-of-the-art results in the set property prediction task.
DOI: 10.48550/arXiv.2311.04640
Download the article in the conference archive at OpenReview (PDF): https://openreview.net/forum?id=aBUidW4Nkd
Download the article from Aleksnadr Panov's personal page (PDF): https://grafft.github.io/assets/pdf/smm2024.pdf
Download the earlier article version via 8 November 2023 at arXiv (PDF): https://arxiv.org/abs/2311.04640
Daniil Kirilenko, Vitaliy Vorobyov, Alexey K. Kovalev, Aleksandr I. Panov. Object-Centric Learning with Slot Mixture Module // The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7–11, 2024.