When should you use generator expressions and when should you use list comprehensions in Python?
# Generator expression
(x*2 for x in range(256))
# List comprehension
[x*2 for x in range(256)]
generator-expressionlist-comprehensionpython
When should you use generator expressions and when should you use list comprehensions in Python?
# Generator expression
(x*2 for x in range(256))
# List comprehension
[x*2 for x in range(256)]
You are passing in a generator expression.
A list comprehension is specified with square brackets ([...]
). A list comprehension builds a list object first, so it uses syntax closely related to the list literal syntax:
list_literal = [1, 2, 3]
list_comprehension = [i for i in range(4) if i > 0]
A generator expression, on the other hand, creates an iterator object. Only when iterating over that object is the contained loop executed and are items produced. The generator expression does not retain those items; there is no list object being built.
A generator expression always uses (...)
round parethesis, but when used as the only argument to a call, the parenthesis can be omitted; the following two expressions are equivalent:
sum((i*i for i in xrange(5))) # with parenthesis
sum(i*i for i in xrange(5)) # without parenthesis around the generator
Quoting from the generator expression documentation:
The parentheses can be omitted on calls with only one argument. See section Calls for the detail.
This is what you should be doing:
g = (i for i in range(10))
It's a generator expression. It's equivalent to
def temp(outer):
for i in outer:
yield i
g = temp(range(10))
but if you just wanted an iterable with the elements of range(10)
, you could have done
g = range(10)
You do not need to wrap any of this in a function.
If you're here to learn what code to write, you can stop reading. The rest of this post is a long and technical explanation of why the other code snippets are broken and should not be used, including an explanation of why your timings are broken too.
This:
g = [(yield i) for i in range(10)]
is a broken construct that should have been taken out years ago. 8 years after the problem was originally reported, the process to remove it is finally beginning. Don't do it.
While it's still in the language, on Python 3, it's equivalent to
def temp(outer):
l = []
for i in outer:
l.append((yield i))
return l
g = temp(range(10))
List comprehensions are supposed to return lists, but because of the yield
, this one doesn't. It acts kind of like a generator expression, and it yields the same things as your first snippet, but it builds an unnecessary list and attaches it to the StopIteration
raised at the end.
>>> g = [(yield i) for i in range(10)]
>>> [next(g) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: [None, None, None, None, None, None, None, None, None, None]
This is confusing and a waste of memory. Don't do it. (If you want to know where all those None
s are coming from, read PEP 342.)
On Python 2, g = [(yield i) for i in range(10)]
does something entirely different. Python 2 doesn't give list comprehensions their own scope - specifically list comprehensions, not dict or set comprehensions - so the yield
is executed by whatever function contains this line. On Python 2, this:
def f():
g = [(yield i) for i in range(10)]
is equivalent to
def f():
temp = []
for i in range(10):
temp.append((yield i))
g = temp
making f
a generator-based coroutine, in the pre-async sense. Again, if your goal was to get a generator, you've wasted a bunch of time building a pointless list.
This:
g = [(yield from range(10))]
is silly, but none of the blame is on Python this time.
There is no comprehension or genexp here at all. The brackets are not a list comprehension; all the work is done by yield from
, and then you build a 1-element list containing the (useless) return value of yield from
. Your f3
:
def f3():
g = [(yield from range(10))]
when stripped of the unnecessary list-building, simplifies to
def f3():
yield from range(10)
or, ignoring all the coroutine support stuff yield from
does,
def f3():
for i in range(10):
yield i
Your timings are also broken.
In your first timing, f1
and f2
create generator objects that can be used inside those functions, though f2
's generator is weird. f3
doesn't do that; f3
is a generator function. f3
's body does not run in your timings, and if it did, its g
would behave quite unlike the other functions' g
s. A timing that would actually be comparable with f1
and f2
would be
def f4():
g = f3()
In your second timing, f2
doesn't actually run, for the same reason f3
was broken in the previous timing. In your second timing, f2
is not iterating over a generator. Instead, the yield from
turns f2
into a generator function itself.
Best Answer
John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:
Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.
Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.