The power of JIT compilation

Darren Wilkinson has an interesting blog post, comparing execution times of a Gibbs sampler implemented in various different programming languages. His sampler generates draws from the following joint distribution

f(x,y) \propto x^{2}\exp(-xy^{2}-y^{2}+2y-4x)

by drawing iteratively from the full conditional distributions

x|y \sim \text{Gamma}(3,y^{2}+4)

and

y|x \sim N\left(\frac{1}{1+x},\frac{1}{2(1+x)}\right).

Darren presents different implementations of this sampler in R, Python, C, Java, Scala, and Groovy, and then compares their running times. You will have to take a look at Darren’s blog post for the complete results, but one thing they show is that popular interpreted languages such as R and Python are many times slower than compiled C code.

With this in mind I decided to give his sampler a try myself, using one of my favorite languages for quick prototyping: Matlab. Here’s the code and corresponding running time:

function [x_samp,y_samp] = gibbs(n,thin)

x_samp = zeros(n,1);
y_samp = zeros(n,1);

y=0;
for i=1:n
    gammarands=randg(3,thin,1);
    normrands=randn(thin,1);

    for j=1:thin
        x=(y^2+4)*gammarands(j);
        y=1/(1+x)+normrands(j)/sqrt(2*x+2);
    end

    x_samp(i) = x;
    y_samp(i) = y;

end
end
tic
gibbs(5e4,1e3);
toc
Elapsed time is 10.027671 seconds.

Although Matlab is also an interpreted language, this running time of 10 seconds is remarkably close to the ~8 seconds Darren reports for compiled C code (using a PC very similar to mine). The reason that the code above is so fast is that Matlab internally compiles the inner loop of the ‘gibbs’ function; a trick called ‘Just In Time’ compilation. For obtaining the random numbers used by the function, Matlab uses fast built-in code (the ‘randn’ and ‘randg’ functions) which are only called once every 1000 iterations of the inner loop (1000 being the value of the ‘thin’ parameter). Despite its similar speed, coding and running the Matlab example above is much easier than coding, compiling, and running the corresponding C code.

The Matlab JIT compiler is powerful, but not very smart: it can only compile very simple operations like those found in the inner loop of the code above. This means that we have to be very careful with the operations we perform in the inner loop of a function. The example below shows what happens when we move the random number generation (‘randn’ and ‘randg’) to the inner loop:

function [x_samp,y_samp] = gibbs2(n,thin)

x_samp = zeros(n,1);
y_samp = zeros(n,1);

y=0;
for i=1:n

    for j=1:thin
        x=(y^2+4)*randg(3,1,1);
        y=1/(1+x)+randn/sqrt(2*x+2);
    end

    x_samp(i) = x;
    y_samp(i) = y;
end
end
tic
gibbs2(5e4,1e3);
toc
Elapsed time is 218.589886 seconds.

The new code takes 20 times as long! The problem with the new code is that Matlab does not know what to do with the ‘randn’ and ‘randg’ functions until it executes them, and is therefore no longer able to compile the inner loop of the algorithm.

To summarize: compiled code can be much faster than interpreted code for number crunching, but if we are smart and use JIT compilation well, we can get remarkably close!

Note that many other languages besides Matlab also support some form of JIT compilation: Darren Wilkinson’s post has a nice example of speeding things up by JIT compiling Python code.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code lang=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" extra="">