We saw in a previous post how the Kullback-Leibler divergence influence a VAE’s encoder and decoder outputs. In particular, we could notice that whereas the encoder outputs are closer to a standard multivariate normal distribution thanks to the KL divergence, the result is far from being perfect and there are still some gaps. The Adversarial Autoencoder tends to fix that problem by using a Generative Adversarial Network rather than the KL divergence.

On April 24th Deep Learning for Sciences, Engineering, and Arts Meetup, the following problem was discussed: “Why for binary classification don’t we just pick up some values to represent the two possible outcomes (e.g. 0 and 1) and use regression with a linear output and a MSE loss?”. I had the impression that the provided answers were not totally clear for everybody. I am therefore writing this short note, hoping that the arguments presented below will help for a better understanding.

The loss function used for the training of Variational Autoencoders (VAEs) is divided in two terms. The first one measures the quality of the autoencoding, i.e. the error between the original sample and its reconstruction. The second term is the Kullback-Leibler divergence (abbreviated KL divergence) with respect to a standard multivariate normal distribution. We will illustrate with a few plots the influence of the KL divergence on the encoder and decoder outputs.