Missing Data In Pandas – Data Science Tutorials

Machine Learning July 31, 2022 Data Science, Pandas Leave a comment 3,128 Views

10- Missing Data

Missing Data In Pandas¶

Real-world data is dirty. It is important to preprocess the data before analyzing the data.

In [1]:

#importing libraries
import pandas as pd
import numpy as np

Note: Missing data in the Pandas is represented by the value NaN (Not a Number)

1. Handling Missing data in Series¶

Create a series which contain NaN values¶

In [2]:

s=pd.Series(["A",np.nan,"C","B",np.nan,"E",np.nan,np.nan,"D"])
s

Out[2]:

0      A
1    NaN
2      C
3      B
4    NaN
5      E
6    NaN
7    NaN
8      D
dtype: object

To check the missing values in the data we use isnull() method.¶

In [3]:

s.isnull()

Out[3]:

0    False
1     True
2    False
3    False
4     True
5    False
6     True
7     True
8    False
dtype: bool

In [4]:

s[s.isnull()]

Out[4]:

1    NaN
4    NaN
6    NaN
7    NaN
dtype: object

Similarly we can also check the data which is not null using notnull() method.¶

In [5]:

s.notnull()

Out[5]:

0     True
1    False
2     True
3     True
4    False
5     True
6    False
7    False
8     True
dtype: bool

In [6]:

s[s.notnull()]

Out[6]:

0    A
2    C
3    B
5    E
8    D
dtype: object

If you want to remove the missing data, you can use the dropna method.¶

In [7]:

s.dropna()

Out[7]:

0    A
2    C
3    B
5    E
8    D
dtype: object

2. Handling Missing data in DataFrame¶

In [8]:

df=pd.DataFrame([[12,32,13,55,78,54],[42,np.nan,54,np.nan,55,855],[np.nan,64,np.nan,45,np.nan,78],[55,447,552,np.nan,85,889]])
df

Out[8]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54
1	42.0	NaN	54.0	NaN	55.0	855
2	NaN	64.0	NaN	45.0	NaN	78
3	55.0	447.0	552.0	NaN	85.0	889

By default, the dropna method removes rows with missing data.¶

In [9]:

df.dropna()

Out[9]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54

Now, Let’s remove the columns with missing data.¶

In [10]:

df.dropna(axis=1)

Out[10]:

	5
0	54
1	855
2	78
3	889

If you want to assign another value instead of missing data, you can use the fillna method.¶

In [11]:

df.fillna(0)

Out[11]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54
1	42.0	0.0	54.0	0.0	55.0	855
2	0.0	64.0	0.0	45.0	0.0	78
3	55.0	447.0	552.0	0.0	85.0	889

Using the dictionary structure, you can assign the missing data in each column to a different value with the fillna method. Let me show that.¶

In [12]:

df.fillna({0:15,1:25,2:35,3:48,4:66,5:56})

Out[12]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54
1	42.0	25.0	54.0	48.0	55.0	855
2	15.0	64.0	35.0	45.0	66.0	78
3	55.0	447.0	552.0	48.0	85.0	889

If you want to assign the value in the upper row to the missing data, you can use the method = “ffill” or “bfill”¶

Forward Fill¶

In [13]:

df.fillna(method="ffill")

Out[13]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54
1	42.0	32.0	54.0	55.0	55.0	855
2	42.0	64.0	54.0	45.0	55.0	78
3	55.0	447.0	552.0	45.0	85.0	889

Backward Fill¶

In [14]:

df.fillna(method="bfill")

Out[14]:

	0	1	2	3	4	5
0	12.0	32.0	13.0	55.0	78.0	54
1	42.0	64.0	54.0	45.0	55.0	855
2	55.0	64.0	552.0	45.0	85.0	78
3	55.0	447.0	552.0	NaN	85.0	889

Machine Learning Tutorials, Courses and Certifications

Missing Data In Pandas – Data Science Tutorials

Related Articles

Missing Data In Pandas¶

1. Handling Missing data in Series¶

Create a series which contain NaN values¶

To check the missing values in the data we use isnull() method.¶

Similarly we can also check the data which is not null using notnull() method.¶

If you want to remove the missing data, you can use the dropna method.¶

2. Handling Missing data in DataFrame¶

By default, the dropna method removes rows with missing data.¶

Now, Let’s remove the columns with missing data.¶

If you want to assign another value instead of missing data, you can use the fillna method.¶

Using the dictionary structure, you can assign the missing data in each column to a different value with the fillna method. Let me show that.¶

If you want to assign the value in the upper row to the missing data, you can use the method = “ffill” or “bfill”¶

Forward Fill¶

Backward Fill¶

Related

About Machine Learning

Check Also

Groupby in Pandas – Data Science Tutorials

Leave a Reply Cancel reply

OpenCV Python Project for Bus Detection from an Image

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

DataOps Methodology Cognitive Class Exam Answers:-

Python MYSQL Create Database

Web Application Firewall Quiz Answers NSE 2 Information Security Awareness Fortinet

Python MySQL Order By

FUNDAMENTALS OF DIGITAL MARKETING: MODULE 20 Quiz Answers

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service