Python Pandas – Convert Scientific Notation to Decimal

pandaspython-2.7

Probably it is an old question, I found the similar questions below but I still can see the scientific notation in my output file.

Suppressing scientific notation in pandas?

Pandas read scientific notation and change

Python Pandas Scientific Notation Iconsistent

I have tried to incorporate set_option and df.apply(pd.to_numeric, args=('coerce',)) etc to my code below while do not work.

df = pd.read_csv(Input)  

dfNew = df[['co_A','co_B','co_C']]  
# I firstly select columns from df then would like to convert scientific notation to decimal type in my output file.

dfNew.to_csv(Output, index = False, sep = '\t')

Still I can see scientific notation in my output file. Anyone can help?

co_A  co_B  co_C
167 0.0 59.6
168 0.0 60.6
191 8e-09   72.6
197 -4.7718e-06 12.3
197 0.0 92.4
198 0.0 39.5

Best Answer

you can use float_format parameter when calling .to_csv() function:

In [207]: df
Out[207]:
   co_A          co_B  co_C
0   167  0.000000e+00  59.6
1   168  0.000000e+00  60.6
2   191  8.000000e-09  72.6
3   197 -4.771800e-06  12.3
4   197  0.000000e+00  92.4
5   198  0.000000e+00  39.5

In [208]: fn = r'D:\temp\.data\out.csv'

In [209]: df.to_csv(fn, index=False, sep='\t', float_format='%.6f')

out.csv:

co_A    co_B    co_C
167 0.000000    59.600000
168 0.000000    60.600000
191 0.000000    72.600000
197 -0.000005   12.300000
197 0.000000    92.400000
198 0.000000    39.500000

Related Solutions

Python Pandas – How to Iterate Over Rows in a DataFrame

DataFrame.iterrows is a generator which yields both the index and row (as a Series):

import pandas as pd

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index()  # make sure indexes pair with number of rows

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

10 100
11 110
12 120

Obligatory disclaimer from the documentation

Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:

Look for a vectorized solution: many operations can be performed using built-in methods or NumPy functions, (boolean) indexing, …

When you have a function that cannot work on the full DataFrame/Series at once, it is better to use apply() instead of iterating over the values. See the docs on function application.

If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the enhancing performance section for some examples of this approach.

Other answers in this thread delve into greater depth on alternatives to iter* functions if you are interested to learn more.

Python Pandas – Renaming Column Names

Rename Specific Columns

Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})

# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Minimal Code Example

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

The following methods all work and produce the same output:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) 

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Remember to assign the result back, as the modification is not-inplace. Alternatively, specify inplace=True:

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

You can specify errors='raise' to raise errors if an invalid column-to-rename is specified.

Reassign Column Headers

Use df.set_axis() with axis=1.

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Headers can be assigned directly:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x