Towards interpretable machine-learning models by extracting interpretable latent variables

In this subproject we will design and analyse machine-learning models that extract interpretable latent variables (explanatory factors) underlying datasets. Part of the project is further development of a mathematical definition of interpretability.

The search for algorithms that find interpretable latent variables (and for corresponding mathe matical definitions) has recently taken place under the header of disentangling latent factors. A good example to have in mind is a dataset consisting of pictures of objects, with the objects rotated over arbitrary angles, translated in various directions, having different colours. Disentanglement aims to, in an unsupervised manner, separate rotation of objects from their translation and their colour, see e.g. the d-sprites benchmark dataset.

Most algorithms that are designed for the disentanglement of latent factors perform deep variational inference (a computationally feasible approximation of Bayesian inference) with the Variational Auto encoder (VAE) as the flagship implementation. The β-VAE, the factor-VAE, the β-TCVAE and the DIP-VAE are all adaptations of the standard VAE aiming to improve the disentanglement of latent factors.

Although separation of latent factors is an important aspect of interpretation, we stress that the latter is broader: after separating the various factors from each other, the algorithms should also be able to ‘identify’ the individual factors such as rotation or translation. This requires a more precise definition, and as a first proxy, in case a ground-truth data generation process is known, we have posed that such algorithms should capture at least the topology or geometry of the true latent variables, i.e., they should provide an encoding of the dataset to a space with the same topology or geometry as the true latent variables. This aspect of finding interpretable latent variables has received considerably less attention, but will be central to our project.

Since VAEs are the basis of many disentanglement algorithms, since there are heuristic arguments in favour of using them for extracting latent variables, and since the loss function stimulates a continuous dependence between points in data space and points in latent space, we initially investigated whether a VAE can capture the topology of latent variables. Standard VAEs are in general systematically unable to do so. We addressed this issue by developing the Diffusion VAE (ΔVAE) which allows for encoding the data in a manifold. The ΔVAE is, in principle, capable of capturing latent factors according to our definition and does so for simple datasets.

However, the ΔVAE can currently only capture manifold-structure of latent variables, one needs a priori knowledge of the topology of the latent variables, and it has trouble capturing the topology of more complex datasets, indicating the need for a better loss function.

In this project we will therefore:

  • Extend the design of the ΔVAE, to be able to capture more types of topological and geometric structure and more types of symmetries. In particular, we will allow for discrete latent variables and incorporate group-convolutional network layers. Many of these design aspects come back in the other subprojects as well.
  • Automate the discovery of geometrical, topological and symmetry structure. This is closely related to finding conserved quantities and so-called quantities of interest for physical systems such as the N-body problem in subproject 6.
  • Sharpen the mathematical definition of interpretability of latent factors. In particular, we will remove the necessity of assuming a ground-truth generation process.
  • Perform mathematical, variational analysis to find out which loss functions lead to what kind of mathematical interpretability of latent factors. We will then incorporate successful loss functions in our algorithms.

For this subproject the PhD student Mahefa Ravelonanosy has been appointed under the supervision of lead researcher Jim Portegies. Both are located in the Centre for Analysis, Scientific Computing and Applications at the Eindhoven University of Technology.