Create a fast C Python extension with Cython tutorial
Tutorials, pythonCython is an easy way to create Python extensions in “pseudo-Python” without having to deal with the verboseness of the Python API. It also does not rely on you knowing much of the C language. Cython is based on Pyrex which I have some experience with. This tutorial shows you how to convert a simple Python module to a Python extension.
Prerequisites
You’ll need to be able to compile C files for this tutorial to work. You also need Cython installed and the Python development files installed. This is what I did on my “Gutsy Ubuntu” to get it going:
sudo apt-get install python-dev sudo easy_install Cython
I can attempt to help you if you can’t get things to compile, as long as you check Google for your errors first!
Python module
We’ll create a Python module to work out prime numbers up to a certain number. It’s not the most efficient method, but that’s not important in this tutorial. Lets call it prime.py.
def calculate(limit):
primes = []
divisor = 0
for current in xrange(limit):
previous = []
for divisor in xrange(2, current):
if current % divisor == 0:
break
if divisor == current - 1:
primes.append(current)
return primes
Add another program to import this module and use it. Let us say test_prime.py.
import prime numbers = prime.calculate(10000) print len(numbers)
Running “python test_prime.py” will return the number of prime numbers from 0 to 10,000, which is 1229. Lets add some timing code to see how fast this runs.
from timeit import Timer
t = Timer('prime.calculate(10000)', 'import prime')
reps = 5
print sum(t.repeat(repeat=reps, number=1)) / reps
This code above runs “prime.calculate” 5 times and averages out the time, so we get a more accurate result. On this old computer, it runs at 1.46 seconds. For the curious, this code under Psyco runs at 0.12 seconds.
Converting to a Cython extension
So far we have test_prime.py which calls the prime.py module. To convert it to Cython we need to create a setup.py file and make a copy of prime.py to c_prime.pyx. The pyx extension is specific to Pyrex and Cython. Here is our setup.py file:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
name = 'PrimeTest',
ext_modules=[
Extension('c_prime', ['c_prime.pyx'])
],
cmdclass = {'build_ext': build_ext}
)
The ext_modules list tells distutils that we’re building an extension module called c_prime from c_prime.pyx. The cmdclass argument mentions that we’re using a third-party tool (Cython) to build the extension.
Running the following will hopefully build your extension. “build_ext” tells distutils that we’re building an extension and the “–inplace” argument means it will put the built extension in the current directory.
python setup.py build_ext --inplace
This is the step that fails the most since it depends on gcc, Cython, Python development code and possibly other things. Try to find your problem on Google, otherwise ask here if you have no luck.
Now let us modify test_prime.py so that it is using our extension. Change the use of prime to c_prime on line 3:
t = Timer('c_prime.calculate(10000)', 'import c_prime')
Right now we have a C extension generated from our Python module. The time it takes to run test_prime.py now takes 0.79 seconds–almost double in speed. And that’s without any modification to our Python code!
More speed
If you know a bit of C, you can check out the generated C code in c_prime.c. It is quite ugly and has about 670 lines. Looking closer at the C code, you can see it demonstrating (within comment blocks) which line of code it has converted from with the translated code below. This is handy for optimisation to work out ways of increasing speed.
Looking at line 4 of our c_prime.pyx file, you can see this is simple a for-loop to count from 0 to a variable. In the C code it is about 20 lines as you can see from this excerpt!
/* "/home/gak/src/muckaround/cython-getting-started-tutorial/c_prime.pyx":4
* primes = []
* divisor = 0
* for current in xrange(limit): # <<<<<<<<<<<<<<
* previous = []
* for divisor in xrange(2, current):
*/
__pyx_1 = PyTuple_New(1); if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;}
Py_INCREF(__pyx_v_limit);
PyTuple_SET_ITEM(__pyx_1, 0, __pyx_v_limit);
__pyx_3 = PyObject_Call(__pyx_builtin_xrange, ((PyObject *)__pyx_1), NULL); if (unlikely(!__pyx_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;}
Py_DECREF(((PyObject *)__pyx_1)); __pyx_1 = 0;
if (PyList_CheckExact(__pyx_3)) { __pyx_2 = 0; __pyx_1 = __pyx_3; Py_INCREF(__pyx_1); }
else { __pyx_1 = PyObject_GetIter(__pyx_3); if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;} }
Py_DECREF(__pyx_3); __pyx_3 = 0;
for (;;) {
if (PyList_CheckExact(__pyx_1)) { if (__pyx_2 >= PyList_GET_SIZE(__pyx_1)) break; __pyx_3 = PyList_GET_ITEM(__pyx_1, __pyx_2++); Py_INCREF(__pyx_3); }
else {
__pyx_3 = PyIter_Next(__pyx_1);
if (!__pyx_3) {
break;
}
}
Py_DECREF(__pyx_v_current);
__pyx_v_current = __pyx_3;
__pyx_3 = 0;
The reason why this is 20 lines is that it is basically doing the loop via Python. You might be able to make out that the C code is calling xrange in Python, then accessing its iterator until there are no more items in the xrange iterator. It is also converting Python objects to long integers. This is ridiculous for C code since it can be accomplished in one line.
We can tell Cython to make it only generate one line of C code by using its extended syntax. The syntax is:
for i from 0 <= i < n:
We have two of these for-loops, so lets change them both:
def calculate(limit):
primes = []
divisor = 0
for current from 0 <= current < limit:
previous = []
for divisor from 2 <= divisor < current:
if current % divisor == 0:
break
if divisor == current - 1:
primes.append(current)
return primes
This is only slightly faster and the generated code for that for-loop is 5 lines now. We still want a one line C style for-loop. The reason it is 5 lines in the is that the variable “current” is still a Python object and the C code has to convert between a Python object and a C long integer. We need to tell Cython to use C variable types instead of Python objects. You do this using the “cdef” keyword. Lets do this and while we’re here, we’ll change all the numbers to long integers:
def calculate(long limit):
cdef long current
cdef long divisor
primes = []
divisor = 0
for current from 0 <= current < limit:
previous = []
for divisor from 2 <= divisor < current:
if current % divisor == 0:
break
if divisor == current - 1:
primes.append(current)
return primes
As you can see I have changed the “limit” argument to say “long limit” which tells Cython to convert the limit argument from a Python object to a long integer when it is called. The “current” and “divisor” variables are now defined at the start of the function preceded with “cdef long”, telling Cython that they are long integers in the C code.
Now looking at that for-loop in the generated C code, it is now one line as we wanted:
for (__pyx_v_current = 0; __pyx_v_current < __pyx_v_limit; __pyx_v_current++) {
This ends up running at an incredible 0.05 seconds. That is 15x faster than our original Python module.
I hope this tutorial has been useful to you!
UPDATE: It turns out that changing all the variables in the xrange to cdefs will cause the generated C code to be one-liner for-loop. This means the alternate syntax “for i from 0 <= i < n” is not necessary when you’re only working with cdef’ed variables.








May 5th, 2008 at 12:57 am
1. Thanks for the info.
2. It would have been nice if you showed timings with psyco as well.
3. Not related to ctypes or Cython, but your code to generate primes is *quite* inefficient. You only need to make divisors go up to sqrt(current). The reason is that if x is a divisor above sqrt(current), then x*y=current. y will necessarily be below sqrt(current) and you’ve already checked for it.
May 5th, 2008 at 2:12 am
you need a ‘\n’ between build_ext and setup(
build_extsetup(
->
build_ext
setup(
May 5th, 2008 at 7:13 am
Beetle B,
1. Welcome
2. OK I just tried it. 0.12 seconds. I’ll update the post.
3. I realise that it isn’t efficient. It was only there to show off Cython.
May 5th, 2008 at 7:14 am
Alan,
Thanks. Wordpress was mangling my code.
May 5th, 2008 at 7:32 am
So psyco was actually faster than a straight cython run? Interesting.
May 5th, 2008 at 8:07 am
Beetle B,
Originally, I was expecting Python code in Cython to be slower than Python! Also keep in mind that Psyco only runs on 32bit systems.
May 5th, 2008 at 5:22 pm
Small typo:
from distutils.core
import setup
Should be on one line:
from distutils.core import setup
May 5th, 2008 at 5:40 pm
Julian,
Thanks. All fixed.
September 21st, 2008 at 9:30 am
You should switch the order of these lines to the following, otherwise Cython wouldn’t install for me on Ubuntu:
sudo apt-get install python-dev
sudo easy_install Cython
September 21st, 2008 at 9:49 am
@Greg,
Ah good point. Fixed.
November 3rd, 2008 at 8:18 am
[...] How to install Cython, then use it to optimize Python function that calculates Primes [...]
November 3rd, 2008 at 8:19 am
[...] How to install Cython, then use it to optimize Python function that calculates Primes [...]