Groupby in Pandas¶
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently.
In [2]:
#import the libraries
import pandas as pd
import numpy as np
In [3]:
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings','kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
print(ipl_data)
{'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3, 4, 1, 1, 2, 4, 1, 2], 'Year': [2014, 2015, 2014, 2015, 2014, 2015, 2016, 2017, 2016, 2014, 2015, 2017], 'Points': [876, 789, 863, 673, 741, 812, 756, 788, 694, 701, 804, 690]}
In [5]:
df = pd.DataFrame(ipl_data)
df
Out[5]:
Team | Rank | Year | Points | |
---|---|---|---|---|
0 | Riders | 1 | 2014 | 876 |
1 | Riders | 2 | 2015 | 789 |
2 | Devils | 2 | 2014 | 863 |
3 | Devils | 3 | 2015 | 673 |
4 | Kings | 3 | 2014 | 741 |
5 | kings | 4 | 2015 | 812 |
6 | Kings | 1 | 2016 | 756 |
7 | Kings | 1 | 2017 | 788 |
8 | Riders | 2 | 2016 | 694 |
9 | Royals | 4 | 2014 | 701 |
10 | Royals | 1 | 2015 | 804 |
11 | Riders | 2 | 2017 | 690 |
Group the Data Frame by Year¶
In [6]:
grouped = df.groupby('Year')
print(grouped)
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000176E0C8AB60>
Iterate the grouped data¶
In [7]:
for year,group in grouped:
print (year)
print (group)
2014 Team Rank Year Points 0 Riders 1 2014 876 2 Devils 2 2014 863 4 Kings 3 2014 741 9 Royals 4 2014 701 2015 Team Rank Year Points 1 Riders 2 2015 789 3 Devils 3 2015 673 5 kings 4 2015 812 10 Royals 1 2015 804 2016 Team Rank Year Points 6 Kings 1 2016 756 8 Riders 2 2016 694 2017 Team Rank Year Points 7 Kings 1 2017 788 11 Riders 2 2017 690
Get Year 2014 Data¶
In [8]:
grouped.get_group(2014)
Out[8]:
Team | Rank | Year | Points | |
---|---|---|---|---|
0 | Riders | 1 | 2014 | 876 |
2 | Devils | 2 | 2014 | 863 |
4 | Kings | 3 | 2014 | 741 |
9 | Royals | 4 | 2014 | 701 |
Get Sum of Points Year Wise¶
In [9]:
df.groupby('Year').Points.sum()
Out[9]:
Year 2014 3181 2015 3078 2016 1450 2017 1478 Name: Points, dtype: int64
or¶
In [10]:
df.groupby('Year')['Points'].sum()
Out[10]:
Year 2014 3181 2015 3078 2016 1450 2017 1478 Name: Points, dtype: int64
or¶
In [11]:
df.groupby('Year')[['Points']].sum()
Out[11]:
Points | |
---|---|
Year | |
2014 | 3181 |
2015 | 3078 |
2016 | 1450 |
2017 | 1478 |
Get Sum of Points Year and Rank Wise¶
In [12]:
df.groupby(['Year','Rank']).Points.sum()
Out[12]:
Year Rank 2014 1 876 2 863 3 741 4 701 2015 1 804 2 789 3 673 4 812 2016 1 756 2 694 2017 1 788 2 690 Name: Points, dtype: int64