All Posts

Programming a Decision Tree Predictor in Scala (Part 2)

We saw in Part 1 the basic structure of a decision tree. We are now going to create a class to handle the samples and labels of a data set. This class will be used in the remaining parts of this serie.

Programming a Decision Tree Predictor in Scala (Part 1)

Decision trees are simple to understand. Yet they are the basic element of many powerful Machine Learning algorithms such as Random Forest. This serie of blogs will introduce the concept of decision tree and also provide basic scala code for those who want to better understand as well as do some experiments.

Connecting to Jupyter Lab via ssh

Jupyter Lab is only at version 0.32 at the time of writing, but it is already very promising. It is like a small IDE running in your web browser. It allows to conveniently edit and run files on a remote server. We tested Jupyter Lab on a connection with high latency (>300ms) and using the text editor, notebooks or the file manager was easy and reactive. Only the terminal was suffering from some lag, but at a level that was still bearable. We are going to see now how to remotely connect to a server running Jupyter Lab.

Pearson correlation counterexamples

Pearson correlation, the most common type of correlation, is widely used in Data Science. However incorrect conclusions are often drawn from a low or high correlation. We will see below some counterexamples, hoping that they will help to better remember some limitations of the Pearson correlation.

Toward publishing Jupyter notebooks with Hugo

Update (2019-04): a simplified workflow for easily publishing notebooks is now described in the post Blogging with Jupyter notebooks and Hugo. It is based on nb2hugo, a tool to convert Jupyter notebooks into markdown pages with front matter. Jupyter Notebook is a great way to create a single document that contains code that can be executed, formatted text to provide detailed explanations, as well as figures. It is even possible to easily include mathematical expressions that will be beautifully rendered. Hugo is a simple yet very powerful static site generator. Being able to write an article entirely in Jupyter Notebook and directly convert it to Hugo content would be perfect, but how could we proceed?

An introduction to CUDA in Python (Part 5)

In Part 4 of this introduction, we saw that the performance of our convolution kernel is limited by memory bandwidth. We are going to see how to improve performance by using shared memory.

An introduction to CUDA in Python (Part 4)

In this part, we will learn how to profile a CUDA kernel using both nvprof and nvvp, the Visual Profiler. We will use the convolution kernel from Part 3, and discover thanks to profiling how to improve it.

An introduction to CUDA in Python (Part 3)

This is the third part of an introduction to CUDA in Python. If you missed the beginning, you are welcome to go back to Part 1 or Part 2. In this third part, we are going to write a convolution kernel to filter an image.

An introduction to CUDA in Python (Part 2)

In the first part of this introduction, we saw how to launch a CUDA kernel in Python using the Open Source just-in-time compiler Numba. In this part, we will learn more about CUDA kernels.

An introduction to CUDA in Python (Part 1)

Coding directly in Python functions that will be executed on GPU may allow to remove bottlenecks while keeping the code short and simple. In this introduction, we show one way to use CUDA in Python, and explain some basic principles of CUDA programming.