Drew's BayPIGgies Pyrex presentation

At the 2004/11/11 BayPiggies meeting, I did some Pyrex demos.

My first demo was how to build a minimal Pyrex module that has a Python statement and a C call.

Then I talked about using Pyrex to optimize a simple Python program.

Initial Python program

#!/usr/bin/python

from glob import glob
import itertools

class Bigrams:
    """original python version"""
    def __init__(self):
        self.pairs = {}
        
    def add(self,word):
        for i in range(len(word)-1):
            pair = word[i]+word[i+1]
            self.pairs[pair] = self.pairs.get(pair,0) + 1
                
    def freqs(self):
        """get list of (pair,freq)"""
        return self.pairs.items()

files = "/usr/include/*.h"

b = Bigrams()

for line in itertools.chain(*[file(x) for x in glob(files)]):
    for word in line.split():
        b.add(word)

freq_pair = [(v,k) for k,v in b.freqs()]
freq_pair.sort()
for fp in freq_pair[-8:]:
    print "%8d %s" % fp 
On my box, I get this list of the most popular bigrams:
% cat /proc/cpuinfo | egrep "vendor|MHz"
vendor_id       : AuthenticAMD
cpu MHz         : 1400.081
vendor_id       : AuthenticAMD
cpu MHz         : 1400.081

% time ./bigrams
   12210 te
   12486 fi
   13053 nt
   13267 ef
   13780 er
   13940 ne
   17057 de
   30403 in
./bigrams  4.19s user 0.07s system 92% cpu 4.604 total

Move to Pyrex

I can't wait 4 seconds for this stuff! So, I moved the Bigrams class into Pyrex and built an extension. That version runs in about 4.7sec. Pyrex translates (most) Python to C code that uses the Python API. Run times are usually unaffected. The big speedups come when you let Pyrex write some of your code in faster C by avoiding PyObjects; calling other C libs; etc.

Try a Pyrex language extension

Then, I changed "for i in range(len(word)-1):" to "for i from 0 <= i < len(word)-1:" to demo one of Pyrex's bonus features. That loop syntax isn't related to translating normal Python to C; nor is it related to calling C functions. The for/from Pyrex loop uses a plain C integer as the loop index, avoiding a few Python API calls per iteration. This gets me a bit of a speedup. With the Pyrex int loop, the program takes 3.7sec.

Rewrite the algorithm

Now I am ready to tinker with the algorithm. I rewrite the Bigrams class in Pyrex, using a fixed-size C buffer to store the frequency counts. This removes the Python dict access, and I am also now looping over characters in a C string.
cdef class Bigrams4:
    """pack results into a fixed array like a C programmer would"""
    cdef int pairs[65536]
    def __init__(self):
        for x in range(65536):
            self.pairs[x]=0
        
    def add(self,word):
        cdef char *cword
        cdef unsigned short pairidx
        cword = word
        
        for i from 0 <= i < len(word)-1:
            pairidx = (cword[i]<<8) + cword[i+1]
            self.pairs[pairidx] = self.pairs[pairidx] + 1
                
    def freqs(self):
        """get list of (pair,freq)"""
        letter_pairs = []
        for x in range(65536):
            if self.pairs[x]>0:
                letter_pairs.append(("%s%s" % (chr(x >> 8), chr(x & 0xff)),
                                     self.pairs[x]))
        return letter_pairs
Note 'cdef class', which means this is a new C type. Pyrex writes this C struct for the new type:
struct __pyx_obj_6bigram_Bigrams4 {
  PyObject_HEAD
  int (pairs[65536]);
};
My 'self.pairs' code, formerly an attribute lookup, is now a C struct member access:

Pyrex source:
self.pairs[pairidx] = self.pairs[pairidx] + 1
C emitted from the Pyrex compiler:
/* "/home/drewp/projects/digrams/bigram.pyx":51 */
    (((struct __pyx_obj_6bigram_Bigrams4 *)__pyx_v_self)->pairs[__pyx_v_pairidx]) = ((((struct __pyx_obj_6bigram_Bigrams4 *)__pyx_v_self)->pairs[__pyx_v_pairidx]) + 1);

This runs in 1.2 sec, which I felt was good enough for my demo. Note that my rewritten algorithm could also be ported to plain Python. That version takes 3.7 sec-- the C datatypes made the big difference here.

Swig version

I also did a swig version which runs in .68 sec. It's probably faster because it doesn't use a class at all, because the final packing is faster, or because it doesn't contain the error checks that Pyrex includes. I haven't investigated the exact reason.

Source

Get bigrams.tgz and run "python setup.py build_ext --inplace" to build the Pyrex extension (you'll need Pyrex installed). Run "make" to build the swig extension (you'll need swig).

Then, try "time bigrams" after setting line 29 to the class version you want to run. Edit line 25 to adjust which files that are scanned. The file "bigrams5" contains the fixed-array version written in Python, which I didn't want to have cluttering the original bigrams program.

These source files also contain a demo of calling uuid_generate from libuuid. Remove all the uuid code if you're on a platform that doesn't have that lib, I guess.