Create a fast C Python extension with Cython tutorial

Cython is an easy way to create Python extensions in “pseudo-Python” without having to deal with the verboseness of the Python API. It also does not rely on you knowing much of the C language. Cython is based on Pyrex which I have some experience with. This tutorial shows you how to convert a simple Python module to a Python extension.

Prerequisites

You’ll need to be able to compile C files for this tutorial to work. You also need Cython installed and the Python development files installed. This is what I did on my “Gutsy Ubuntu” to get it going:

sudo apt-get install python-dev
sudo easy_install Cython

I can attempt to help you if you can’t get things to compile, as long as you check Google for your errors first!

Python module

We’ll create a Python module to work out prime numbers up to a certain number. It’s not the most efficient method, but that’s not important in this tutorial. Lets call it prime.py.

def calculate(limit):
    primes = []
    divisor = 0
    for current in xrange(limit):
        previous = []
        for divisor in xrange(2, current):
            if current % divisor == 0:
                break
        if divisor == current - 1:
            primes.append(current)
    return primes

Add another program to import this module and use it. Let us say test_prime.py.

import prime

numbers = prime.calculate(10000)
print len(numbers)

Running “python test_prime.py” will return the number of prime numbers from 0 to 10,000, which is 1229. Lets add some timing code to see how fast this runs.

from timeit import Timer

t = Timer('prime.calculate(10000)', 'import prime')
reps = 5
print sum(t.repeat(repeat=reps, number=1)) / reps

This code above runs “prime.calculate” 5 times and averages out the time, so we get a more accurate result. On this old computer, it runs at 1.46 seconds. For the curious, this code under Psyco runs at 0.12 seconds.

Converting to a Cython extension

So far we have test_prime.py which calls the prime.py module. To convert it to Cython we need to create a setup.py file and make a copy of prime.py to c_prime.pyx. The pyx extension is specific to Pyrex and Cython. Here is our setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
  name = 'PrimeTest',
  ext_modules=[
    Extension('c_prime', ['c_prime.pyx'])
    ],
  cmdclass = {'build_ext': build_ext}
)

The ext_modules list tells distutils that we’re building an extension module called c_prime from c_prime.pyx. The cmdclass argument mentions that we’re using a third-party tool (Cython) to build the extension.

Running the following will hopefully build your extension. “build_ext” tells distutils that we’re building an extension and the “–inplace” argument means it will put the built extension in the current directory.

python setup.py build_ext --inplace

This is the step that fails the most since it depends on gcc, Cython, Python development code and possibly other things. Try to find your problem on Google, otherwise ask here if you have no luck.

Now let us modify test_prime.py so that it is using our extension. Change the use of prime to c_prime on line 3:

t = Timer('c_prime.calculate(10000)', 'import c_prime')

Right now we have a C extension generated from our Python module. The time it takes to run test_prime.py now takes 0.79 seconds–almost double in speed. And that’s without any modification to our Python code!

More speed

If you know a bit of C, you can check out the generated C code in c_prime.c. It is quite ugly and has about 670 lines. Looking closer at the C code, you can see it demonstrating (within comment blocks) which line of code it has converted from with the translated code below. This is handy for optimisation to work out ways of increasing speed.

Looking at line 4 of our c_prime.pyx file, you can see this is simple a for-loop to count from 0 to a variable. In the C code it is about 20 lines as you can see from this excerpt!

/* "/home/gak/src/muckaround/cython-getting-started-tutorial/c_prime.pyx":4
 *     primes = []
 *     divisor = 0
 *     for current in xrange(limit):             # <<<<<<<<<<<<<<
 *         previous = []
 *         for divisor in xrange(2, current):
 */
  __pyx_1 = PyTuple_New(1); if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;}
  Py_INCREF(__pyx_v_limit);
  PyTuple_SET_ITEM(__pyx_1, 0, __pyx_v_limit);
  __pyx_3 = PyObject_Call(__pyx_builtin_xrange, ((PyObject *)__pyx_1), NULL); if (unlikely(!__pyx_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;}
  Py_DECREF(((PyObject *)__pyx_1)); __pyx_1 = 0;
  if (PyList_CheckExact(__pyx_3)) { __pyx_2 = 0; __pyx_1 = __pyx_3; Py_INCREF(__pyx_1); }
  else { __pyx_1 = PyObject_GetIter(__pyx_3); if (unlikely(!__pyx_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1;} }
  Py_DECREF(__pyx_3); __pyx_3 = 0;
  for (;;) {
    if (PyList_CheckExact(__pyx_1)) { if (__pyx_2 >= PyList_GET_SIZE(__pyx_1)) break; __pyx_3 = PyList_GET_ITEM(__pyx_1, __pyx_2++); Py_INCREF(__pyx_3); }
    else {
      __pyx_3 = PyIter_Next(__pyx_1);
      if (!__pyx_3) {
        break;
      }
    }
    Py_DECREF(__pyx_v_current);
    __pyx_v_current = __pyx_3;
    __pyx_3 = 0;

The reason why this is 20 lines is that it is basically doing the loop via Python. You might be able to make out that the C code is calling xrange in Python, then accessing its iterator until there are no more items in the xrange iterator. It is also converting Python objects to long integers. This is ridiculous for C code since it can be accomplished in one line.

We can tell Cython to make it only generate one line of C code by using its extended syntax. The syntax is:

for i from 0 <= i < n:

We have two of these for-loops, so lets change them both:

def calculate(limit):
    primes = []
    divisor = 0
    for current from 0 <= current < limit:
        previous = []
        for divisor from 2 <= divisor < current:
            if current % divisor == 0:
                break
        if divisor == current - 1:
            primes.append(current)
    return primes

This is only slightly faster and the generated code for that for-loop is 5 lines now. We still want a one line C style for-loop. The reason it is 5 lines in the is that the variable “current” is still a Python object and the C code has to convert between a Python object and a C long integer. We need to tell Cython to use C variable types instead of Python objects. You do this using the “cdef” keyword. Lets do this and while we’re here, we’ll change all the numbers to long integers:

def calculate(long limit):
    cdef long current
    cdef long divisor
    primes = []
    divisor = 0
    for current from 0 <= current < limit:
        previous = []
        for divisor from 2 <= divisor < current:
            if current % divisor == 0:
                break
        if divisor == current - 1:
            primes.append(current)
    return primes

As you can see I have changed the “limit” argument to say “long limit” which tells Cython to convert the limit argument from a Python object to a long integer when it is called. The “current” and “divisor” variables are now defined at the start of the function preceded with “cdef long”, telling Cython that they are long integers in the C code.

Now looking at that for-loop in the generated C code, it is now one line as we wanted:

for (__pyx_v_current = 0; __pyx_v_current < __pyx_v_limit; __pyx_v_current++) {

This ends up running at an incredible 0.05 seconds. That is 15x faster than our original Python module.

I hope this tutorial has been useful to you!

UPDATE: It turns out that changing all the variables in the xrange to cdefs will cause the generated C code to be one-liner for-loop. This means the alternate syntax “for i from 0 <= i < n” is not necessary when you’re only working with cdef’ed variables.

15 thoughts on “Create a fast C Python extension with Cython tutorial”

  1. 1. Thanks for the info.

    2. It would have been nice if you showed timings with psyco as well.

    3. Not related to ctypes or Cython, but your code to generate primes is *quite* inefficient. You only need to make divisors go up to sqrt(current). The reason is that if x is a divisor above sqrt(current), then x*y=current. y will necessarily be below sqrt(current) and you’ve already checked for it.

  2. You should switch the order of these lines to the following, otherwise Cython wouldn’t install for me on Ubuntu:

    sudo apt-get install python-dev
    sudo easy_install Cython

  3. Hi

    ” …. The time it takes to run test_prime.py now takes 0.79 seconds…”

    I tried:

    python test_prime.py

    and get:

    Traceback (most recent call last):
    File “test_prime.py”, line 7, in
    print sum(t.repeat(repeat=reps, number=1)) / reps
    File “/usr/lib64/python2.7/timeit.py”, line 220, in repeat
    t = self.timeit(number)
    File “/usr/lib64/python2.7/timeit.py”, line 193, in timeit
    timing = self.inner(it, self.timer)
    File “”, line 6, in inner
    NameError: global name ‘c_prime’ is not defined

    So I imagine that to run cython should be

    cython test_prime.py

    but in this case without output. (all files are in same dir)

    What I am missing?

    Best Regards and thanks for this simple tutorial.

  4. You could use the -a flag to generate a html file instead of opening the .c file. Just write cython -a c_prime.pyx. The open the c_prime.html and look at the yellow lines. These are the lines to optimize.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>