Symbolic Disentangled Representations for Images

Authors

Kovalyov A. Korchemnyi A.

Annotation

The idea of the disentangled representations is to reduce the data to a set of generative factors which generate it. Usually, such representations are vectors in the latent space, in which each coordinate corresponds to one of the generative factors. Then the object represented in this way can be modified by changing the value of a specific coordinate. But first, we need to determine which coordinate handles the desired generative factor, which can be complex with a high vector dimension. In our work, we propose ArSyD (Architecture for Symbolic Disentanglement) that represents each generative factor as a vector of the same dimension as the resulting representation. Then, the object representation is obtained as a superposition of vectors responsible for generative factors. We call such a representation a symbolic disentangled representation. Representation disentanglement is achieved by construction; no additional assumptions about the distributions are made during training, and the model is trained only to reconstruct images. We studied our approach on the objects from the dSprites and CLEVR datasets and provide a comprehensive analysis of the learned symbolic disentangled representations. We also propose new disentanglement metrics that allow you to compare models with different latent representations.

External links

Watch the presentation at Evgeny Osipov's channel:

Reference link

Alexandr Korchemnyi, Alexei Kovalyov. Symbolic Disentangled Representations for Images // VSAONLINE Webinar Series (April 8, 2024).