This week has been hectic, but also very fruitful. I have the functionality I set out for, it works and it is fast! It still needs testing, polish and bug-fixing and is by no means finished, but it is now possible to define indexed math-expressions, translate to code, compile it, wrap it as a python module and import it again for fast numerical calculations. All of this can be done without leaving the isympy command interpreter.
I’ve been hacking on the automatic wrapping of binary routines, refactored the code printers, implemented handling of indexed powers, enabled scalar broadcasting and ufunc-ish support for arbitrary compiled functions. For next week I will focus on physics inspired examples, testing of binary code and improving the documentation. But first, it is time for a little demonstration.
The latest code is here, if you want to try for yourself. There are parts that don’t not work properly, for instance the Cython wrapper cannot handle numpy arrays yet, but as you will see below there are enough working parts that you can do some fancy stuff.
The trace of a Matrix — faster that numpy!
In an earlier blog post, I showed you that the C printer is able to print the necessary loops to implement summations over array indices. The Fortran printer has also this functionality now, so it can be used with f2py. Let’s start with a simple function to calculate numerically the trace of a matrix while running isympy:
In : from sympy.utilities.autowrap import autowrap In : m, n = symbols('m n', integer=True) In : i = Idx('i', m) In : j = Idx('j', n) In : A = IndexedBase('A') In : trace = autowrap(A[i, i])
That’s it. Now lets have a look at the function we created,
In : trace? Type: fortran String Form: <fortran object at 0xa93bea8> Namespace: Interactive Docstring: autofunc - Function signature: autofunc = autofunc(a,[m]) Required arguments: a : input rank-2 array('d') with bounds (m,m) Optional arguments: m := shape(a,0) input int Return objects: autofunc : float
This informative docstring is created by f2py, and only slightly distorted by the blogging software. It tells us everything we need to know. Let’s try this function on a numpy array:
In : import numpy as np In : I = np.asarray(np.eye(1000), order='F') In : trace(I) Out: 1000.0 In : timeit trace(I) 10000 loops, best of 3: 30.3 us per loop In : np.trace(I) Out: 1000.0 In : timeit np.trace(I) 1000 loops, best of 3: 230 us per loop
First of all, please notice the parameter “order=’F'” passed to the numpy array. This ensures that the array is created with column-major memory order which is the Fortran format. Without that, the array is created with row-major order, and our Fortran function would run much, much slower.
Now to the fun part: Did you notice the speedup? The Sympy generated function is almost a factor of 8 faster that numpy! This is great, and I think I can be really happy about this result. But it is also time for a little warning: Please don’t take these numbers as some sort of absolute truth. For all I know I may be comparing apples and oranges.
In fact, the numbers I present are probably more relevant for a comparison of compilers and optimization flags. But, whatever the reason for the speedups, faster python callable functions is no doubt a good thing. For the record: I am running this on a little Acer Aspire One using numpy 1.3.0 shipped with Ubuntu 10.04. The autowrapped binaries are compiled with Ubuntu’s gfortran 4.4.3.
Now lets look at a more expensive array operation. Whereas the trace of an m*m matrix has a cost of order O(m), the product of two m*m matrices scales like O(m^3). First we need to express a matrix-matrix product in terms of Indexed Sympy objects and create the binary function.
In : B = IndexedBase('B') In : C = IndexedBase('C') In : o = symbols('o', integer=True) In : k = Idx('k', o) In : expr_mat_mat = Eq(C[i, j], A[i, k]*B[k, j]); expr_mat_mat Out: C[i, j] = A[i, k]⋅B[k, j]
The matrix-matrix product displayed at Out is defined in terms of an Equality instance. This is a handy way to provide the code printers with both an expression and the variable it should be assigned to. For indexed expressions it is important that the indices on left hand side and right hand side are compatible. (Compatible means usually that the set of non-dummy indices must be identical, but with the exception that the right hand side can be scalar.) For the binary function, we get:
In : matmat = autowrap(expr_mat_mat) In : matmat? Type: fortran String Form: <fortran object at 0x8f35290> Namespace: Interactive Docstring: autofunc - Function signature: c = autofunc(a,b,[m,n,o]) Required arguments: a : input rank-2 array('d') with bounds (m,o) b : input rank-2 array('d') with bounds (o,n) Optional arguments: m := shape(a,0) input int n := shape(b,1) input int o := shape(a,1) input int Return objects: c : rank-2 array('d') with bounds (m,n)
Now let’s test it with a QR factorization on some numpy arrays:
In : M = np.asarray(np.random.rand(100, 100), order='F') In : Q, R = np.linalg.qr(M) In : M_np = np.dot(Q, R) In : M_my = matmat(Q, R) In : np.linalg.norm(M_my - M_np) Out: 1.7522660836e-14
Here is the timing:
In : %timeit M_np = np.dot(Q, R) 100 loops, best of 3: 10.9 ms per loop In : %timeit M_my = matmat(Q, R) 100 loops, best of 3: 6.11 ms per loop
This looks impressive! But, I also found that numpy has a much better scaling than what I get from autowrap on this computer:
In : M = np.asarray(np.random.rand(400, 400), order='F') In : Q, R = np.linalg.qr(M) In : %timeit M_np = np.dot(Q, R) 1 loops, best of 3: 687 ms per loop In : %timeit M_my = matmat(Q, R) 1 loops, best of 3: 1.65 s per loop
I believe that this bad scaling must be an issue with gfortran, or rather an issue with how I use it. The theoretical scaling of O(m^3) correspond to a factor of 64, and while numpy follows theory quite closely, (687/10.9=63.0) the autowrapped function misses by far. (1650/6.11=270) . On another computer, I tested the same calculations with Intels ifort compiler and got much better results that were consistently better than numpy and had the expected scaling.
I have more things I want to show, but this blog post is becoming rather long, so stay tuned for more exciting new functionality in the coming days.