Pandas – How to Apply Multiple Functions to Groupby Apply

aggregateapplyfunctionpandaspandas-groupby

I have a dataframe which shall be grouped and then on each group several functions shall be applied. Normally, I would do this with groupby().agg() (cf. Apply multiple functions to multiple groupby columns), but the functions I'm interested do not need one column as input but multiple columns.

I learned that, when I have one function that has multiple columns as input, I need apply (cf. Pandas DataFrame aggregate function using multiple columns).
But what do I need, when I have multiple functions that have multiple columns as input?

import pandas as pd
df = pd.DataFrame({'x':[2, 3, -10, -10], 'y':[10, 13, 20, 30], 'id':['a', 'a', 'b', 'b']})

def mindist(data): #of course these functions are more complicated in reality
     return min(data['y'] - data['x'])
def maxdist(data):
    return max(data['y'] - data['x'])

I would expect something like df.groupby('id').apply([mindist, maxdist])

    min   max
id      
 a    8    10
 b   30    40

(achieved with pd.DataFrame({'mindist':df.groupby('id').apply(mindist),'maxdist':df.groupby('id').apply(maxdist)} – which obviously isn't very handy if I have a dozend of functions to apply on the grouped dataframe). Initially I thought this OP had the same question, but he seems to be fine with aggregate, meaning his functions take only one column as input.

Best Answer

For this specific issue, how about groupby after difference?

(df['x']-df['y']).groupby(df['id']).agg(['min','max'])

More generically, you could probably do something like

df.groupby('id').apply(lambda x:pd.Series({'min':mindist(x),'max':maxdist(x)}))
Related Question