Find Indexes of Non-NaN Values in Pandas DataFrame – Python 2.7 Guide

dataframepandaspython-2.7

I have a very large dataset (roughly 200000×400), however I have it filtered and only a few hundred values remain, the rest are NaN. I would like to create a list of indexes of those remaining values. I can't seem to find a simple enough solution.

    0     1     2
0   NaN   NaN   1.2
1   NaN   NaN   NaN   
2   NaN   1.1   NaN   
3   NaN   NaN   NaN
4   1.4   NaN   1.01

For instance, I would like a list of [(0,2), (2,1), (4,0), (4,2)].

Best Answer

Convert the dataframe to it's equivalent NumPy array representation and check for NaNs present. Later, take the negation of it's corresponding indices (indicating non nulls) using numpy.argwhere. Since the output required must be a list of tuples, you could then make use of generator map function applying tuple as function to every iterable of the resulting array.

>>> list(map(tuple, np.argwhere(~np.isnan(df.values))))
[(0, 2), (2, 1), (4, 0), (4, 2)]

Related Solutions

Python – Index of Non-NaN Values in Pandas

Just filter them

In [62]:

df['b'].notnull()

Out[62]:
0     True
1    False
2     True
3     True
4     True
Name: b, dtype: bool
In [63]:

df[df['b'].notnull()]
Out[63]:
   A   b   c
0  1  q1   1
2  3  q2   3
3  4  q1 NaN
4  5  q2   7

Python Pandas – Retrieve Indices of NaN Values

It should be efficient to use a scipy coordinate-format sparse matrix to retrieve the coordinates of the null values:

import scipy.sparse as sp

x,y = sp.coo_matrix(df.isnull()).nonzero()
print(list(zip(x,y)))

[(0, 3), (1, 2), (1, 3), (3, 0), (3, 1)]

Note that I'm calling the nonzero method in order to just output the coordinates of the nonzero entries in the underlying sparse matrix since I don't care about the actual values which are all True.

Best Answer

Related Solutions

Python – Index of Non-NaN Values in Pandas

Python Pandas – Retrieve Indices of NaN Values

Related Question