Python Pandas – How to Get Row and Column Indices of Non-NaN Items

indicesnanpandaspython

How do I iterate over a dataframe like the following and return the non-NaN value locations as a tuple. i.e.

df:

     0    1    2
0    NaN NaN   1
1    1   NaN  NaN
2    NaN  2   NaN

I would get an output of [(0, 1), (2, 0), (1, 2)]. Would the best way be to do a nested-for loop? Or is there an easier way I'm unaware of through Pandas.

Best Answer

Assuming you don't need in order, you could stack the nonnull values and work on index values.

In [26]: list(df[df.notnull()].stack().index)
Out[26]: [(0L, '2'), (1L, '0'), (2L, '1')]

In [27]: df[df.notnull()].stack().index
Out[27]:
MultiIndex(levels=[[0, 1, 2], [u'0', u'1', u'2']],
           labels=[[0, 1, 2], [2, 0, 1]])

Furthermore, using stack method, NaN are ignored anyway.

In [28]: list(df.stack().index)
Out[28]: [(0L, '2'), (1L, '0'), (2L, '1')]

Related Solutions

Python Pandas – How to Iterate Over Rows in a DataFrame

DataFrame.iterrows is a generator which yields both the index and row (as a Series):

import pandas as pd

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index()  # make sure indexes pair with number of rows

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

10 100
11 110
12 120

Obligatory disclaimer from the documentation

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:

Look for a vectorized solution: many operations can be performed using built-in methods or NumPy functions, (boolean) indexing, …

When you have a function that cannot work on the full DataFrame/Series at once, it is better to use apply() instead of iterating over the values. See the docs on function application.

If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the enhancing performance section for some examples of this approach.

Other answers in this thread delve into greater depth on alternatives to iter* functions if you are interested to learn more.

Python – Get a List from Pandas DataFrame Column Headers

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use (as shown in Ed Chum's answer):

list(my_dataframe)

Best Answer

Related Solutions

Python Pandas – How to Iterate Over Rows in a DataFrame

Python – Get a List from Pandas DataFrame Column Headers

Related Question