Speeding up NumPy with parallelism
If your NumPy code is too slow, what next?
One option is taking advantage of the multiple cores on your CPU: using
a thread pool to do work in parallel. Another option is to tune your
code so it’s less wasteful. Or, since these are two different sources of
speed, you can do both.
In this article I’ll cover:
- A simple example of making a NumPy algorithm parallel.
- A separate kind of optimization, making a more efficient
implementation in Numba. - How to get even more speed by using both at once.
- Aside: A hardware limit on parallelism.
- Aside: Why not Numba’s built-in parallelism?
From single-threaded to parallelism: an example
Let’s say you want to calculate the sum of the squared difference
between two arrays. Here’s how you’d