Descriptive Statistics for Pandas DataFrame – Data Science Tutorials

Machine Learning July 31, 2022 Data Science, Pandas Leave a comment 2,019 Views

05 – Descriptive Statistics and Stats Operations

1. Descriptive Statistics for Pandas DataFrame¶

In [1]:

import pandas as pd
import numpy as np

In [2]:

data = {'Name':pd.Series(['Akshay','Rajat','Robin','Kapil','James','Cyril']),'Age':pd.Series([25,26,29,27,23,21]),'Rating':pd.Series([4.23,2.35,1.56,3.20,4.62,3.99])}
df = pd.DataFrame(data)
df

Out[2]:

	Name	Age	Rating
0	Akshay	25	4.23
1	Rajat	26	2.35
2	Robin	29	1.56
3	Kapil	27	3.20
4	James	23	4.62
5	Cyril	21	3.99

Fet the descriptive statistics for a specific column in your DataFrame:¶

df[‘DataFrame Column’].describe()

Descriptive Statistics for Categorical Data¶

In [3]:

df['Name'].describe()

Out[3]:

count          6
unique         6
top       Akshay
freq           1
Name: Name, dtype: object

In [4]:

df['Age'].describe()

Out[4]:

count     6.000000
mean     25.166667
std       2.857738
min      21.000000
25%      23.500000
50%      25.500000
75%      26.750000
max      29.000000
Name: Age, dtype: float64

In [5]:

df['Rating'].describe()

Out[5]:

count    6.000000
mean     3.325000
std      1.184884
min      1.560000
25%      2.562500
50%      3.595000
75%      4.170000
max      4.620000
Name: Rating, dtype: float64

In [6]:

df.describe()

Out[6]:

	Age	Rating
count	6.000000	6.000000
mean	25.166667	3.325000
std	2.857738	1.184884
min	21.000000	1.560000
25%	23.500000	2.562500
50%	25.500000	3.595000
75%	26.750000	4.170000
max	29.000000	4.620000

Get the Descriptive Statistics for the Entire Pandas DataFrame¶

In [7]:

df.describe(include='all')

Out[7]:

	Name	Age	Rating
count	6	6.000000	6.000000
unique	6	NaN	NaN
top	Akshay	NaN	NaN
freq	1	NaN	NaN
mean	NaN	25.166667	3.325000
std	NaN	2.857738	1.184884
min	NaN	21.000000	1.560000
25%	NaN	23.500000	2.562500
50%	NaN	25.500000	3.595000
75%	NaN	26.750000	4.170000
max	NaN	29.000000	4.620000

Breaking Down the Descriptive Statistics¶

Sum Entire Data Frame¶

In [8]:

df.sum()

Out[8]:

Name      AkshayRajatRobinKapilJamesCyril
Age                                   151
Rating                              19.95
dtype: object

Sum of all the Age¶

In [9]:

print("Sum of Age:",df.Age.sum())

Sum of Age: 151

Count:¶

In [10]:

print("Count No of Age:",df.Age.count())

Count No of Age: 6

Mean:¶

In [11]:

print("Mean of Age:",df.Age.mean())

Mean of Age: 25.166666666666668

Standard deviation:¶

In [12]:

print("Std. of Age:",df.Age.std())

Std. of Age: 2.8577380332470415

Minimum:¶

In [13]:

print("Minimum Age:",df.Age.min())

Minimum Age: 21

0.25 Quantile:¶

In [14]:

print("0.25 Quantile:",df['Age'].quantile(q=0.25))

0.25 Quantile: 23.5

0.50 Quantile (Median):¶

In [15]:

print("0.50 Quantile Median:",df['Age'].quantile(q=0.50))

0.50 Quantile Median: 25.5

0.75 Quantile:¶

In [16]:

print("0.75 Quantile:",df['Age'].quantile(q=0.75))

0.75 Quantile: 26.75

Maximum:¶

In [17]:

print("Maximum Age:",df.Age.max())

Maximum Age: 29

Product:¶

In [18]:

print("Product of Age:",df.Age.prod())

Product of Age: 245822850

Median:¶

In [19]:

print("Median of Age:",df.Age.median())

Median of Age: 25.5

Cumulative Sum:¶

In [20]:

print("Cumsum of Age:",df.Age.cumsum())

Cumsum of Age: 0     25
1     51
2     80
3    107
4    130
5    151
Name: Age, dtype: int64

Cumulative Product:¶

In [21]:

print("Cumprod of Age:",df.Age.cumprod())

Cumprod of Age: 0           25
1          650
2        18850
3       508950
4     11705850
5    245822850
Name: Age, dtype: int64

Mode:¶

In [32]:

print("Mode:",df.Age.mode())

Mode: 0    21
1    23
2    25
3    26
4    27
5    29
Name: Age, dtype: int64

Absolute Value:¶

In [33]:

print("Absolute Value:",df.Age.abs())

Absolute Value: 0    25
1    26
2    29
3    27
4    23
5    21
Name: Age, dtype: int64

2. Pandas – Basic Functionality¶

axes: Returns a list of the row axis labels
dtype: Returns the dtype of the object.
empty: Returns True if series is empty.
ndim: Returns the number of dimensions of the underlying data, by definition 1.
size: Returns the number of elements in the underlying data.
values: Returns the Series as ndarray.
head(): Returns the first n rows.
tail(): Returns the last n rows.

axes¶

In [22]:

df.axes

Out[22]:

[RangeIndex(start=0, stop=6, step=1),
 Index(['Name', 'Age', 'Rating'], dtype='object')]

dtype¶

In [23]:

df['Age'].dtype

Out[23]:

dtype('int64')

empty¶

In [24]:

df.empty

Out[24]:

False

ndim¶

In [25]:

df.ndim

Out[25]:

size¶

In [26]:

df.size

Out[26]:

values¶

In [27]:

df.values

Out[27]:

array([['Akshay', 25, 4.23],
       ['Rajat', 26, 2.35],
       ['Robin', 29, 1.56],
       ['Kapil', 27, 3.2],
       ['James', 23, 4.62],
       ['Cyril', 21, 3.99]], dtype=object)

head()¶

In [28]:

df.head()

Out[28]:

	Name	Age	Rating
0	Akshay	25	4.23
1	Rajat	26	2.35
2	Robin	29	1.56
3	Kapil	27	3.20
4	James	23	4.62

In [29]:

df.head(2)

Out[29]:

	Name	Age	Rating
0	Akshay	25	4.23
1	Rajat	26	2.35

tail()¶

In [30]:

df.tail()

Out[30]:

	Name	Age	Rating
1	Rajat	26	2.35
2	Robin	29	1.56
3	Kapil	27	3.20
4	James	23	4.62
5	Cyril	21	3.99

In [31]:

df.tail(2)

Out[31]:

	Name	Age	Rating
4	James	23	4.62
5	Cyril	21	3.99