Pandas Groupby – Combine Count and Mean

dataframegroup-bypandaspythonpython-2.7

Working with pandas to try and summarise a data frame as a count of certain categories, as well as the means sentiment score for these categories.

There is a table full of strings that have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts.

My (simplified) data frame looks like this:

source    text              sent
--------------------------------
bar       some string       0.13
foo       alt string        -0.8
bar       another str       0.7
foo       some text         -0.2
foo       more text         -0.5

The output from this should be something like this:

source    count     mean_sent
-----------------------------
foo       3         -0.5
bar       2         0.415

The answer is somewhere along the lines of:

df['sent'].groupby(df['source']).mean()

Yet only gives each source and it's mean, with no column headers.

Best Answer

You can use groupby with aggregate:

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500
Related Question