Python – Why Generator Expression Function is Slower Than Loop Version

generatorgenerator-expressionperformancepythonpython-2.7

I have been operating under the theory that generator expressions tend to be more efficient than normal loops. But then I ran into the following example: write a function which given a number, N, and some factors, ps, returns the sum of all the numbers under N that are a multiple of at least one factor.

Here is a loop version and a shorter generator expression version:

def loops(N, ps):
    total_sum = 0 
    for i in xrange(N):
        for p in ps: 
            if i%p == 0:
                total_sum += i
                break
    return total_sum

def genexp(N, ps):
    return sum(i for i in xrange(N)
               if any(i%p == 0 for p in ps))

I'd expect the two to perform roughly equal, with maybe the comprehension version a little faster, but what I didn't expect was this:

for func in ('loops', 'genexp'):
    print func, timeit.timeit('%s(100000, [3,5,7])' % func, 
                              number=100, 
                              setup='from __main__ import %s' % func)


loops 2.82878184319
genexp 10.1663100719

4x slower isn't even close! Why? What am I misunderstanding?

Best Answer

First of all: generator expressions are memory efficient, not necessarily speed efficient.

Your compact genexp() version is slower for two reasons:

Generator expressions are implemented using a new scope (like a new function). You are producing N new scopes, one for for each any() test. Creating a new scope and tearing it down again is relatively expensive, certainly when done in a loop and then compared with code that doesn't do this.
The sum() and any() names are additional globals to be looked up. In the case of any(), that's an additional N global lookups per test. Globals must be looked up in a dictionary, versus locals which are looked up by index in a C-array (which is very fast).

The latter is but a small component, most of the cost lies in creating and destroying frames (scopes); if you create a version where _any and _sum are locals to the function you get but a small improvement in performance:

>>> def genexp_locals(N, ps, _any=any, _sum=sum):
...     return _sum(i for i in xrange(N)
...                if _any(i%p == 0 for p in ps))
... 
>>> for func in ('loops', 'genexp', 'genexp_locals'):
...     print func, timeit.timeit('%s(100000, [3,5,7])' % func, 
...                               number=100, 
...                               setup='from __main__ import %s' % func)
... 
loops 2.00835800171
genexp 6.45241594315
genexp_locals 6.23843789101

I didn't create a local for xrange to keep that aspect the same. Technically speaking, the _any name is looked up as a closure, not a local, by the generator expression code object, which are not as slow as global lookups but not quite as speedy as a local lookup either.

Related Solutions

Python – Generator Expression vs List Performance

Well, my first step was to set the two tests up independently to ensure that this is not a result of e.g. the order in which the functions are defined.

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop

Sure enough, I can replicate this. OK, next step is to have a look at the bytecode to see what's actually going on:

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE

Notice that the first method creates the list directly, whereas the second method creates a genexpr object and passes that to the global list. This is probably where the overhead lies.

Note also that the difference is approximately a microsecond i.e. utterly trivial.

Other interesting data

This still holds for non-trivial lists

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop

and for less trivial map functions:

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop

and (though less strongly) if we filter the list:

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop

Python List Comprehension vs Generator – Why List Comprehension is Faster

I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:

In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop

So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.

Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).

Best Answer

Related Solutions

Python – Generator Expression vs List Performance

Other interesting data

Python List Comprehension vs Generator – Why List Comprehension is Faster

Related Question