2017-01-08

Python 100 times faster than Grumpy

A weird benchmark

Lately Grumpy has received some attention on reddit. YouTube Engineering showed a blog post (Grumpy) which apparently proved that grumpy scaled much better then CPython on a Fibonacci benchmark. I felt like repeating this microbenchmark with Python to check their results.

The benchmark results

First, the raw runtimes. Very obviously Python is about 100 times faster on 1, 2, 3 and 4 threads. As I only have a 2 core laptop, checking with many more threads did not make much sense. Go scales worse with the number of threads than Python.

Python vs Grumpy on the Fibonacci Benchmark

And on a logarithmic scale.

Python vs Grumpy on the Fibonacci Benchmark

On every single task, Python is between 110 times and 180 times faster then grumpy.

Python than Grumpy

And how is this achieved? I used the almost identical code for Python and grumpy. On Python, I additionally used the numba library, which adds two lines of extra code. Since this library is not available for grumpy I could not use it. It can be argued, whether it is fair to leverage numba. However, if I were to do some task like calculating Fibonacci numbers I would certainly use it, given how simple it is. Or even better, I'd most likely use the @lru_cache decorator to speed things up.

I'd be happy to see a more realistic benchmark comparing grumpy with Python when it comes to serving websites or similar things.

Raw numbers

Threads Py-numba (s) Grumpy (s) Py-threading (s) Py-multiprocessing (s)
1 0.08 9.03 3.55 3.22
2 0.09 16.06 12.61 4.51
3 0.15 23.21 20.10 7.65
4 0.16 24.63 25.80 10.20

Check the discussion in the Python subreddit.

The benchmark code

Grumpy, Py-threading

import time
import threading


def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)


fib(35)

for n_threads in range(1, 5):
    threads = [threading.Thread(target=fib, args=(35,)) for _ in range(n_threads)]
    start = time.time()
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    end = time.time()
    print n_threads, end - start

Py-numba

import time
import threading
from numba import jit


@jit(nogil=True)
def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)


fib(35)

for n_threads in range(1, 5):
    threads = [threading.Thread(target=fib, args=(35,)) for _ in range(n_threads)]
    start = time.time()
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    end = time.time()
    print n_threads, end - start

Py-multiprocessing

import time
import multiprocessing


def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)


fib(35)

for n_threads in range(1, 5):
    threads = [multiprocessing.Process(target=fib, args=(35,)) for _ in range(n_threads)]
    start = time.time()
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    end = time.time()
    print n_threads, end - start