Python – Why List Comprehensions are Faster than Generator Expressions

pythonpython-3.x

Not sure if title is correct terminology.

If you have to compare the characters in 2 strings (A,B) and count the number of matches of chars in B against A:

sum([ch in A for ch in B])

is faster on %timeit than

sum(ch in A for ch in B)

I understand that the first one will create a list of bool, and then sum the values of 1.
The second one is a generator. I'm not clear on what it is doing internally and why it is slower?

Thanks.

Edit with %timeit results:

10 characters

generator expression

list

10000 loops, best of 3: 112 µs per loop

10000 loops, best of 3: 94.6 µs per loop

1000 characters

generator expression

list

100 loops, best of 3: 8.5 ms per loop

100 loops, best of 3: 6.9 ms per loop

10,000 characters

generator expression

list

10 loops, best of 3: 87.5 ms per loop

10 loops, best of 3: 76.1 ms per loop

100,000 characters

generator expression

list

1 loop, best of 3: 908 ms per loop

1 loop, best of 3: 840 ms per loop

Best Answer

I took a look at the disassembly of each construct (using dis). I did this by declaring these two functions:

def list_comprehension():
    return sum([ch in A for ch in B])

def generation_expression():
    return sum(ch in A for ch in B)

and then calling dis.dis with each function.

For the list comprehension:

 0 BUILD_LIST               0
 2 LOAD_FAST                0 (.0)
 4 FOR_ITER                12 (to 18)
 6 STORE_FAST               1 (ch)
 8 LOAD_FAST                1 (ch)
10 LOAD_GLOBAL              0 (A)
12 COMPARE_OP               6 (in)
14 LIST_APPEND              2
16 JUMP_ABSOLUTE            4
18 RETURN_VALUE

and for the generator expression:

 0 LOAD_FAST                0 (.0)
 2 FOR_ITER                14 (to 18)
 4 STORE_FAST               1 (ch)
 6 LOAD_FAST                1 (ch)
 8 LOAD_GLOBAL              0 (A)
10 COMPARE_OP               6 (in)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE            2
18 LOAD_CONST               0 (None)
20 RETURN_VALUE

The disassembly for the actual summation is:

 0 LOAD_GLOBAL              0 (sum)
 2 LOAD_CONST               1 (<code object <genexpr> at 0x7f49dc395240, file "/home/mishac/dev/python/kintsugi/KintsugiModels/automated_tests/a.py", line 12>)
 4 LOAD_CONST               2 ('generation_expression.<locals>.<genexpr>')
 6 MAKE_FUNCTION            0
 8 LOAD_GLOBAL              1 (B)
10 GET_ITER
12 CALL_FUNCTION            1
14 CALL_FUNCTION            1
16 RETURN_VALUE

but this sum disassembly was constant between both your examples, with the only difference being the loading of generation_expression.<locals>.<genexpr> vs list_comprehension.<locals>.<listcomp> (so just loading a different local variable).

The differing bytecode instructions between the first two disassemblies are LIST_APPEND for the list comprehension vs. the conjunction of YIELD_VALUE and POP_TOP for the generator expression.

I won't pretend I know the intrinsics of Python bytecode, but what I gather from this is that the generator expression is implemented as a queue, where the value is generated and then popped. This popping doesn't have to happen in a list comprehension, leading me to believe there'll be a slight amount of overhead in using generators.

Now this doesn't mean that generators are always going to be slower. Generators excel at being memory-efficient, so there will be a threshold N such that list comprehensions will perform slightly better before this threshold (because memory use won't be a problem), but after this threshold, generators will significantly perform better.