Python 100 times faster than Grumpy
A weird benchmark
Lately Grumpy has received some attention on reddit. YouTube Engineering showed a blog post (Grumpy) which apparently proved that grumpy scaled much better then CPython on a Fibonacci benchmark. I felt like repeating this microbenchmark with Python to check their results.
The benchmark results
First, the raw runtimes. Very obviously Python is about 100 times faster on 1, 2, 3 and 4 threads. As I only have a 2 core laptop, checking with many more threads did not make much sense. Go scales worse with the number of threads than Python.
And on a logarithmic scale.
On every single task, Python is between 110 times and 180 times faster then grumpy.
And how is this achieved? I used the almost identical code for Python and grumpy. On Python, I additionally used the numba library, which adds two lines of extra code. Since this library is not available for grumpy I could not use it. It can be argued, whether it is fair to leverage numba. However, if I were to do some task like calculating Fibonacci numbers I would certainly use it, given how simple it is. Or even better, I'd most likely use the @lru_cache decorator to speed things up.
I'd be happy to see a more realistic benchmark comparing grumpy with Python when it comes to serving websites or similar things.
Raw numbers
Threads | Py-numba (s) | Grumpy (s) | Py-threading (s) | Py-multiprocessing (s) |
---|---|---|---|---|
1 | 0.08 | 9.03 | 3.55 | 3.22 |
2 | 0.09 | 16.06 | 12.61 | 4.51 |
3 | 0.15 | 23.21 | 20.10 | 7.65 |
4 | 0.16 | 24.63 | 25.80 | 10.20 |
Check the discussion in the Python subreddit.
The benchmark code
Grumpy, Py-threading
import time import threading def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2) fib(35) for n_threads in range(1, 5): threads = [threading.Thread(target=fib, args=(35,)) for _ in range(n_threads)] start = time.time() for thread in threads: thread.start() for thread in threads: thread.join() end = time.time() print n_threads, end - start
Py-numba
import time import threading from numba import jit @jit(nogil=True) def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2) fib(35) for n_threads in range(1, 5): threads = [threading.Thread(target=fib, args=(35,)) for _ in range(n_threads)] start = time.time() for thread in threads: thread.start() for thread in threads: thread.join() end = time.time() print n_threads, end - start
Py-multiprocessing
import time import multiprocessing def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2) fib(35) for n_threads in range(1, 5): threads = [multiprocessing.Process(target=fib, args=(35,)) for _ in range(n_threads)] start = time.time() for thread in threads: thread.start() for thread in threads: thread.join() end = time.time() print n_threads, end - start