We saw in a previous post how the Kullback-Leibler divergence influence a VAE’s encoder and decoder outputs. In particular, we could notice that whereas the encoder outputs are closer to a standard multivariate normal distribution thanks to the KL divergence, the result is far from being perfect and there are still some gaps. The Adversarial Autoencoder tends to fix that problem by using a Generative Adversarial Network rather than the KL divergence.

On April 24th Deep Learning for Sciences, Engineering, and Arts Meetup, the following problem was discussed: “Why for binary classification don’t we just pick up some values to represent the two possible outcomes (e.g. 0 and 1) and use regression with a linear output and a MSE loss?”. I had the impression that the provided answers were not totally clear for everybody. I am therefore writing this short note, hoping that the arguments presented below will help for a better understanding.

The loss function used for the training of Variational Autoencoders (VAEs) is divided in two terms. The first one measures the quality of the autoencoding, i.e. the error between the original sample and its reconstruction. The second term is the Kullback-Leibler divergence (abbreviated KL divergence) with respect to a standard multivariate normal distribution. We will illustrate with a few plots the influence of the KL divergence on the encoder and decoder outputs.

We saw in Part 4 how to build a decision tree predictor. We are now going to create a predictor from a very classic machine learning data set, the Iris data set.

We saw in Part 1 the basic structure of a decision tree. In Part 2 we created a class to handle the samples and labels of a data set. And in Part 3 we saw how to compute the leaves’ values to fit a data set. In this part, we are going to combine the previous results to build a decision tree predictor.

We saw in Part 1 the basic structure of a decision tree and we created in Part 2 a class to handle the samples and labels of a data set. We are going to see now how to compute the prediction values of the leaves to fit a data set.

We saw in Part 1 the basic structure of a decision tree. We are now going to create a class to handle the samples and labels of a data set. This class will be used in the remaining parts of this serie.

Decision trees are simple to understand. Yet they are the basic element of many powerful Machine Learning algorithms such as Random Forest. This serie of blogs will introduce the concept of decision tree and also provide basic scala code for those who want to better understand as well as do some experiments.