Python List Comprehension vs Generator – Why List Comprehension is Faster

generator-expressionlist-comprehensionpythonpython-3.x

I'm using Python 3.3.1 64-bit on Windows and this code snippet:

len ([None for n in range (1, 1000000) if n%3 == 1])

executes in 136ms, compared to this one:

sum (1 for n in range (1, 1000000) if n%3 == 1)

which executes in 146ms. Shouldn't a generator expression be faster or the same speed as the list comprehension in this case?

I quote from Guido van Rossum From List Comprehensions to Generator Expressions:

…both list comprehensions and generator expressions in Python 3 are
actually faster than they were in Python 2! (And there is no longer a
speed difference between the two.)

EDIT:

I measured the time with timeit. I know that it is not very accurate, but I care only about relative speeds here and I'm getting consistently shorter time for list comprehension version, when I test with different numbers of iterations.

Best Answer

I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:

In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop

So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.

Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).

Related Solutions

Python Generator Expressions vs List Comprehensions – Key Differences

John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.

Python Generators vs List Comprehension Performance

First of all the calls are to next(or __next__ in Python 3) method of the generator object not for some even number check.

In Python 2 you are not going to get any additional line for a list comprehension(LC) because LC are not creating any object, but in Python 3 you will because now to make it similar to a generator expression an additional code object(<listcomp>) is created for a LC as well.

>>> cProfile.run('sum([number for number in range(9999999) if number % 2 == 0])')
         5 function calls in 1.751 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.601    1.601    1.601    1.601 <string>:1(<listcomp>)
        1    0.068    0.068    1.751    1.751 <string>:1(<module>)
        1    0.000    0.000    1.751    1.751 {built-in method exec}
        1    0.082    0.082    0.082    0.082 {built-in method sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

>>> cProfile.run('sum((number for number in range(9999999) if number % 2 == 0))')
         5000005 function calls in 2.388 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  5000001    1.873    0.000    1.873    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    2.388    2.388 <string>:1(<module>)
        1    0.000    0.000    2.388    2.388 {built-in method exec}
        1    0.515    0.515    2.388    2.388 {built-in method sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The number of calls are different though 1(LC) compared to 5000001 in generator expression, this is most because sum is consuming the iterator hence has to call its __next__ method 500000 + 1 times(last 1 is probably for StopIteration to end the iteration). For a list comprehension all the magic happens inside its code object where the LIST_APPEND helps it in appending items one by one to the list, i.e no visible calls for cProfile.

Best Answer

Related Solutions

Python Generator Expressions vs List Comprehensions – Key Differences

Python Generators vs List Comprehension Performance

Related Question