Situation

I have a pandas dataframe defined as follows:

import pandas as pd

headers = ['Group', 'Element', 'Case', 'Score', 'Evaluation']
data = [
    ['A', 1, 'x', 1.40, 0.59],
    ['A', 1, 'y', 9.19, 0.52],
    ['A', 2, 'x', 8.82, 0.80],
    ['A', 2, 'y', 7.18, 0.41],
    ['B', 1, 'x', 1.38, 0.22],
    ['B', 1, 'y', 7.14, 0.10],
    ['B', 2, 'x', 9.12, 0.28],
    ['B', 2, 'y', 4.11, 0.97],
]
df = pd.DataFrame(data, columns=headers)

which looks like this in console output:

  Group  Element Case  Score  Evaluation
0     A        1    x   1.40        0.59
1     A        1    y   9.19        0.52
2     A        2    x   8.82        0.80
3     A        2    y   7.18        0.41
4     B        1    x   1.38        0.22
5     B        1    y   7.14        0.10
6     B        2    x   9.12        0.28
7     B        2    y   4.11        0.97

Problem

I'd like to perform a grouping-and-aggregation operation on df that will give me the following result dataframe:

  Group  Max_score_value  Max_score_element  Max_score_case  Min_evaluation
0     A             9.19                  1               y            0.41 
1     B             9.12                  2               x            0.10

To clarify in more detail: I'd like to group by the Group column, and then apply aggregation to get the following result columns:

Max_score_value: the group-maximum value from the Score column.
Max_score_element: the value from the Element column that corresponds to the group-maximum Score value.
Max_score_case: the value from the Case column that corresponds to the group-maximum Score value.
Min_evaluation: the group-minimum value from the Evaluation column.

Tried thus far

I've come up with the following code for the grouping-and-aggregation:

result = (
    df.set_index(['Element', 'Case'])
    .groupby('Group')
    .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
    .reset_index()
)
print(result)

which gives as output:

  Group Score         Evaluation
          max  idxmax        min
0     A  9.19  (1, y)       0.41
1     B  9.12  (2, x)       0.10

As you can see the basic data is there, but it's not quite in the format yet that I need. It's this last step that I'm struggling with. Does anyone here have some good ideas for generating a result dataframe in the format that I'm looking for?

# collapse multi index column to single level column result.columns = [y + '_' + x if y != '' else x for x, y in result.columns] # split the idxmax column into two columns result = result.assign( max_score_element = result.idxmax_Score.str[0], max_score_case = result.idxmax_Score.str[1] ).drop('idxmax_Score', 1) result #Group max_Score min_Evaluation max_score_case max_score_element #0 A 9.19 0.41 y 1 #1 B 9.12 0.10 x 2

(df.groupby('Group') .agg({'Score': 'idxmax', 'Evaluation': 'min'}) .set_index('Score') .join(df.drop('Evaluation',1)) .reset_index(drop=True)) #Evaluation Group Element Case Score #0 0.41 A 1 y 9.19 #1 0.10 B 2 x 9.12

%%timeit (df.groupby('Group') .agg({'Score': 'idxmax', 'Evaluation': 'min'}) .set_index('Score') .join(df.drop('Evaluation',1)) .reset_index(drop=True)) # 100 loops, best of 3: 3.47 ms per loop %%timeit result = ( df.set_index(['Element', 'Case']) .groupby('Group') .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'}) .reset_index() ) result.columns = [y + '_' + x if y != '' else x for x, y in result.columns] result = result.assign( max_score_element = result.idxmax_Score.str[0], max_score_case = result.idxmax_Score.str[1] ).drop('idxmax_Score', 1) # 100 loops, best of 3: 7.61 ms per loop

Python Pandas – Grouping and Aggregation with Multiple Functions

Situation

Problem

Tried thus far

Best Answer

Related Question

Situation

Problem

Tried thus far

Best Answer

Related Solutions

Python Pandas Group By – Using Multiple Functions in a Group By with Pandas

Related Question