Groupby in Pandas¶
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently.
In [2]:
#import the libraries
import pandas as pd
import numpy as np
In [3]:
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings','kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
print(ipl_data)
{'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3, 4, 1, 1, 2, 4, 1, 2], 'Year': [2014, 2015, 2014, 2015, 2014, 2015, 2016, 2017, 2016, 2014, 2015, 2017], 'Points': [876, 789, 863, 673, 741, 812, 756, 788, 694, 701, 804, 690]}
In [5]:
df = pd.DataFrame(ipl_data)
df
Out[5]:
| Team | Rank | Year | Points | |
|---|---|---|---|---|
| 0 | Riders | 1 | 2014 | 876 |
| 1 | Riders | 2 | 2015 | 789 |
| 2 | Devils | 2 | 2014 | 863 |
| 3 | Devils | 3 | 2015 | 673 |
| 4 | Kings | 3 | 2014 | 741 |
| 5 | kings | 4 | 2015 | 812 |
| 6 | Kings | 1 | 2016 | 756 |
| 7 | Kings | 1 | 2017 | 788 |
| 8 | Riders | 2 | 2016 | 694 |
| 9 | Royals | 4 | 2014 | 701 |
| 10 | Royals | 1 | 2015 | 804 |
| 11 | Riders | 2 | 2017 | 690 |
Group the Data Frame by Year¶
In [6]:
grouped = df.groupby('Year')
print(grouped)
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000176E0C8AB60>
Iterate the grouped data¶
In [7]:
for year,group in grouped:
print (year)
print (group)
2014
Team Rank Year Points
0 Riders 1 2014 876
2 Devils 2 2014 863
4 Kings 3 2014 741
9 Royals 4 2014 701
2015
Team Rank Year Points
1 Riders 2 2015 789
3 Devils 3 2015 673
5 kings 4 2015 812
10 Royals 1 2015 804
2016
Team Rank Year Points
6 Kings 1 2016 756
8 Riders 2 2016 694
2017
Team Rank Year Points
7 Kings 1 2017 788
11 Riders 2 2017 690
Get Year 2014 Data¶
In [8]:
grouped.get_group(2014)
Out[8]:
| Team | Rank | Year | Points | |
|---|---|---|---|---|
| 0 | Riders | 1 | 2014 | 876 |
| 2 | Devils | 2 | 2014 | 863 |
| 4 | Kings | 3 | 2014 | 741 |
| 9 | Royals | 4 | 2014 | 701 |
Get Sum of Points Year Wise¶
In [9]:
df.groupby('Year').Points.sum()
Out[9]:
Year 2014 3181 2015 3078 2016 1450 2017 1478 Name: Points, dtype: int64
or¶
In [10]:
df.groupby('Year')['Points'].sum()
Out[10]:
Year 2014 3181 2015 3078 2016 1450 2017 1478 Name: Points, dtype: int64
or¶
In [11]:
df.groupby('Year')[['Points']].sum()
Out[11]:
| Points | |
|---|---|
| Year | |
| 2014 | 3181 |
| 2015 | 3078 |
| 2016 | 1450 |
| 2017 | 1478 |
Get Sum of Points Year and Rank Wise¶
In [12]:
df.groupby(['Year','Rank']).Points.sum()
Out[12]:
Year Rank
2014 1 876
2 863
3 741
4 701
2015 1 804
2 789
3 673
4 812
2016 1 756
2 694
2017 1 788
2 690
Name: Points, dtype: int64
Machine Learning Tutorials, Courses and Certifications