DataFrame.iterrows
is a generator which yields both the index and row (as a Series):
import pandas as pd
df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index() # make sure indexes pair with number of rows
for index, row in df.iterrows():
print(row['c1'], row['c2'])
10 100
11 110
12 120
Obligatory disclaimer from the documentation
Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:
- Look for a vectorized solution: many operations can be performed using built-in methods or NumPy functions, (boolean) indexing, …
- When you have a function that cannot work on the full DataFrame/Series at once, it is better to use
apply()
instead of iterating over the values. See the docs on function application.
- If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the enhancing performance section for some examples of this approach.
Other answers in this thread delve into greater depth on alternatives to iter* functions if you are interested to learn more.
A list of lists named xss
can be flattened using a nested list comprehension:
flat_list = [
x
for xs in xss
for x in xs
]
The above is equivalent to:
flat_list = []
for xs in xss:
for x in xs:
flat_list.append(x)
Here is the corresponding function:
def flatten(xss):
return [x for xs in xss for x in xs]
This is the fastest method.
As evidence, using the timeit
module in the standard library, we see:
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the methods based on +
(including the implied use in sum
) are, of necessity, O(L**2)
when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1
times, the second M items L-2
times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2
.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
Best Answer
You can use the
DataFrame
constructor withlists
created byto_list
:And for a new
DataFrame
:A solution with
apply(pd.Series)
is very slow: