1. Descriptive Statistics for Pandas DataFrame¶
In [1]:
import pandas as pd
import numpy as np
In [2]:
data = {'Name':pd.Series(['Akshay','Rajat','Robin','Kapil','James','Cyril']),'Age':pd.Series([25,26,29,27,23,21]),'Rating':pd.Series([4.23,2.35,1.56,3.20,4.62,3.99])}
df = pd.DataFrame(data)
df
Out[2]:
Name | Age | Rating | |
---|---|---|---|
0 | Akshay | 25 | 4.23 |
1 | Rajat | 26 | 2.35 |
2 | Robin | 29 | 1.56 |
3 | Kapil | 27 | 3.20 |
4 | James | 23 | 4.62 |
5 | Cyril | 21 | 3.99 |
Fet the descriptive statistics for a specific column in your DataFrame:¶
- df[‘DataFrame Column’].describe()
Descriptive Statistics for Categorical Data¶
In [3]:
df['Name'].describe()
Out[3]:
count 6 unique 6 top Akshay freq 1 Name: Name, dtype: object
In [4]:
df['Age'].describe()
Out[4]:
count 6.000000 mean 25.166667 std 2.857738 min 21.000000 25% 23.500000 50% 25.500000 75% 26.750000 max 29.000000 Name: Age, dtype: float64
In [5]:
df['Rating'].describe()
Out[5]:
count 6.000000 mean 3.325000 std 1.184884 min 1.560000 25% 2.562500 50% 3.595000 75% 4.170000 max 4.620000 Name: Rating, dtype: float64
In [6]:
df.describe()
Out[6]:
Age | Rating | |
---|---|---|
count | 6.000000 | 6.000000 |
mean | 25.166667 | 3.325000 |
std | 2.857738 | 1.184884 |
min | 21.000000 | 1.560000 |
25% | 23.500000 | 2.562500 |
50% | 25.500000 | 3.595000 |
75% | 26.750000 | 4.170000 |
max | 29.000000 | 4.620000 |
Get the Descriptive Statistics for the Entire Pandas DataFrame¶
In [7]:
df.describe(include='all')
Out[7]:
Name | Age | Rating | |
---|---|---|---|
count | 6 | 6.000000 | 6.000000 |
unique | 6 | NaN | NaN |
top | Akshay | NaN | NaN |
freq | 1 | NaN | NaN |
mean | NaN | 25.166667 | 3.325000 |
std | NaN | 2.857738 | 1.184884 |
min | NaN | 21.000000 | 1.560000 |
25% | NaN | 23.500000 | 2.562500 |
50% | NaN | 25.500000 | 3.595000 |
75% | NaN | 26.750000 | 4.170000 |
max | NaN | 29.000000 | 4.620000 |
Breaking Down the Descriptive Statistics¶
Sum Entire Data Frame¶
In [8]:
df.sum()
Out[8]:
Name AkshayRajatRobinKapilJamesCyril Age 151 Rating 19.95 dtype: object
Sum of all the Age¶
In [9]:
print("Sum of Age:",df.Age.sum())
Sum of Age: 151
Count:¶
In [10]:
print("Count No of Age:",df.Age.count())
Count No of Age: 6
Mean:¶
In [11]:
print("Mean of Age:",df.Age.mean())
Mean of Age: 25.166666666666668
Standard deviation:¶
In [12]:
print("Std. of Age:",df.Age.std())
Std. of Age: 2.8577380332470415
Minimum:¶
In [13]:
print("Minimum Age:",df.Age.min())
Minimum Age: 21
0.25 Quantile:¶
In [14]:
print("0.25 Quantile:",df['Age'].quantile(q=0.25))
0.25 Quantile: 23.5
0.50 Quantile (Median):¶
In [15]:
print("0.50 Quantile Median:",df['Age'].quantile(q=0.50))
0.50 Quantile Median: 25.5
0.75 Quantile:¶
In [16]:
print("0.75 Quantile:",df['Age'].quantile(q=0.75))
0.75 Quantile: 26.75
Maximum:¶
In [17]:
print("Maximum Age:",df.Age.max())
Maximum Age: 29
Product:¶
In [18]:
print("Product of Age:",df.Age.prod())
Product of Age: 245822850
Median:¶
In [19]:
print("Median of Age:",df.Age.median())
Median of Age: 25.5
Cumulative Sum:¶
In [20]:
print("Cumsum of Age:",df.Age.cumsum())
Cumsum of Age: 0 25 1 51 2 80 3 107 4 130 5 151 Name: Age, dtype: int64
Cumulative Product:¶
In [21]:
print("Cumprod of Age:",df.Age.cumprod())
Cumprod of Age: 0 25 1 650 2 18850 3 508950 4 11705850 5 245822850 Name: Age, dtype: int64
Mode:¶
In [32]:
print("Mode:",df.Age.mode())
Mode: 0 21 1 23 2 25 3 26 4 27 5 29 Name: Age, dtype: int64
Absolute Value:¶
In [33]:
print("Absolute Value:",df.Age.abs())
Absolute Value: 0 25 1 26 2 29 3 27 4 23 5 21 Name: Age, dtype: int64
2. Pandas – Basic Functionality¶
- axes: Returns a list of the row axis labels
- dtype: Returns the dtype of the object.
- empty: Returns True if series is empty.
- ndim: Returns the number of dimensions of the underlying data, by definition 1.
- size: Returns the number of elements in the underlying data.
- values: Returns the Series as ndarray.
- head(): Returns the first n rows.
- tail(): Returns the last n rows.
axes¶
In [22]:
df.axes
Out[22]:
[RangeIndex(start=0, stop=6, step=1), Index(['Name', 'Age', 'Rating'], dtype='object')]
dtype¶
In [23]:
df['Age'].dtype
Out[23]:
dtype('int64')
empty¶
In [24]:
df.empty
Out[24]:
False
ndim¶
In [25]:
df.ndim
Out[25]:
2
size¶
In [26]:
df.size
Out[26]:
18
values¶
In [27]:
df.values
Out[27]:
array([['Akshay', 25, 4.23], ['Rajat', 26, 2.35], ['Robin', 29, 1.56], ['Kapil', 27, 3.2], ['James', 23, 4.62], ['Cyril', 21, 3.99]], dtype=object)
head()¶
In [28]:
df.head()
Out[28]:
Name | Age | Rating | |
---|---|---|---|
0 | Akshay | 25 | 4.23 |
1 | Rajat | 26 | 2.35 |
2 | Robin | 29 | 1.56 |
3 | Kapil | 27 | 3.20 |
4 | James | 23 | 4.62 |
In [29]:
df.head(2)
Out[29]:
Name | Age | Rating | |
---|---|---|---|
0 | Akshay | 25 | 4.23 |
1 | Rajat | 26 | 2.35 |
tail()¶
In [30]:
df.tail()
Out[30]:
Name | Age | Rating | |
---|---|---|---|
1 | Rajat | 26 | 2.35 |
2 | Robin | 29 | 1.56 |
3 | Kapil | 27 | 3.20 |
4 | James | 23 | 4.62 |
5 | Cyril | 21 | 3.99 |
In [31]:
df.tail(2)
Out[31]:
Name | Age | Rating | |
---|---|---|---|
4 | James | 23 | 4.62 |
5 | Cyril | 21 | 3.99 |