Python – Why List Comprehensions are Faster than Generator Expressions

pythonpython-3.x

Not sure if title is correct terminology.

If you have to compare the characters in 2 strings (A,B) and count the number of matches of chars in B against A:

sum([ch in A for ch in B])

is faster on %timeit than

sum(ch in A for ch in B)

I understand that the first one will create a list of bool, and then sum the values of 1.
The second one is a generator. I'm not clear on what it is doing internally and why it is slower?

Thanks.

Edit with %timeit results:

10 characters

generator expression

list

10000 loops, best of 3: 112 µs per loop

10000 loops, best of 3: 94.6 µs per loop

1000 characters

generator expression

list

100 loops, best of 3: 8.5 ms per loop

100 loops, best of 3: 6.9 ms per loop

10,000 characters

generator expression

list

10 loops, best of 3: 87.5 ms per loop

10 loops, best of 3: 76.1 ms per loop

100,000 characters

generator expression

list

1 loop, best of 3: 908 ms per loop

1 loop, best of 3: 840 ms per loop

Best Answer

I took a look at the disassembly of each construct (using dis). I did this by declaring these two functions:

def list_comprehension():
    return sum([ch in A for ch in B])

def generation_expression():
    return sum(ch in A for ch in B)

and then calling dis.dis with each function.

For the list comprehension:

 0 BUILD_LIST               0
 2 LOAD_FAST                0 (.0)
 4 FOR_ITER                12 (to 18)
 6 STORE_FAST               1 (ch)
 8 LOAD_FAST                1 (ch)
10 LOAD_GLOBAL              0 (A)
12 COMPARE_OP               6 (in)
14 LIST_APPEND              2
16 JUMP_ABSOLUTE            4
18 RETURN_VALUE

and for the generator expression:

 0 LOAD_FAST                0 (.0)
 2 FOR_ITER                14 (to 18)
 4 STORE_FAST               1 (ch)
 6 LOAD_FAST                1 (ch)
 8 LOAD_GLOBAL              0 (A)
10 COMPARE_OP               6 (in)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE            2
18 LOAD_CONST               0 (None)
20 RETURN_VALUE

The disassembly for the actual summation is:

 0 LOAD_GLOBAL              0 (sum)
 2 LOAD_CONST               1 (<code object <genexpr> at 0x7f49dc395240, file "/home/mishac/dev/python/kintsugi/KintsugiModels/automated_tests/a.py", line 12>)
 4 LOAD_CONST               2 ('generation_expression.<locals>.<genexpr>')
 6 MAKE_FUNCTION            0
 8 LOAD_GLOBAL              1 (B)
10 GET_ITER
12 CALL_FUNCTION            1
14 CALL_FUNCTION            1
16 RETURN_VALUE

but this sum disassembly was constant between both your examples, with the only difference being the loading of generation_expression.<locals>.<genexpr> vs list_comprehension.<locals>.<listcomp> (so just loading a different local variable).

The differing bytecode instructions between the first two disassemblies are LIST_APPEND for the list comprehension vs. the conjunction of YIELD_VALUE and POP_TOP for the generator expression.

I won't pretend I know the intrinsics of Python bytecode, but what I gather from this is that the generator expression is implemented as a queue, where the value is generated and then popped. This popping doesn't have to happen in a list comprehension, leading me to believe there'll be a slight amount of overhead in using generators.

Now this doesn't mean that generators are always going to be slower. Generators excel at being memory-efficient, so there will be a threshold N such that list comprehensions will perform slightly better before this threshold (because memory use won't be a problem), but after this threshold, generators will significantly perform better.

Related Solutions

Python – Generator Expression vs List Performance

Well, my first step was to set the two tests up independently to ensure that this is not a result of e.g. the order in which the functions are defined.

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop

Sure enough, I can replicate this. OK, next step is to have a look at the bytecode to see what's actually going on:

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE

Notice that the first method creates the list directly, whereas the second method creates a genexpr object and passes that to the global list. This is probably where the overhead lies.

Note also that the difference is approximately a microsecond i.e. utterly trivial.

Other interesting data

This still holds for non-trivial lists

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop

and for less trivial map functions:

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop

and (though less strongly) if we filter the list:

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop

Python List Comprehension vs Generator – Why List Comprehension is Faster

I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:

In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop

So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.

Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).

Best Answer

Related Solutions

Python – Generator Expression vs List Performance

Other interesting data

Python List Comprehension vs Generator – Why List Comprehension is Faster

Related Question