Pandas DataFrame – Data Science Tutorials

Machine Learning July 31, 2022 Data Science, Pandas Leave a comment 2,441 Views

03- DataFrame

DataFrames¶

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of DataFrame

Potentially columns are of different types
Size : Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns

A pandas DataFrame can be created using the following constructor :

pandas.DataFrame( data, index, columns, dtype, copy)

Create DataFrame: A pandas DataFrame can be created using various inputs like :

Lists, 2D List and List of a Tuple
dict
Series
Numpy ndarrays

Empty DataFrame¶

In [1]:

import pandas as pd
df=pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []

1. Create a DataFrame from List¶

The DataFrame can be created using a single list or a list of lists.

Series vs DataFrame¶

Using Series¶

In [2]:

import pandas as pd
data= [1,2,5,4,6]
Ser=pd.Series(data)
Ser

Out[2]:

0    1
1    2
2    5
3    4
4    6
dtype: int64

Using DataFrame¶

In [3]:

import pandas as pd
data= [1,2,5,4,6]
Ser=pd.DataFrame(data)
Ser

Out[3]:

	0
0	1
1	2
2	5
3	4
4	6

Note: As you can seen in Series there is no column name but in DataFrame there is a default column starting from Zero(0)

Create a DataFrame From 2D List¶

In [4]:

data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)

[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]

In [5]:

df=pd.DataFrame(data)
df

Out[5]:

	0	1	2
0	Robin	26	45.34
1	Karan	25	78.50
2	Priya	23	87.67
3	Varun	22	56.00
4	Keisha	23	97.00

adding column names¶

In [6]:

df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df

Out[6]:

	Name	Age	Marks
0	Robin	26	45.34
1	Karan	25	78.50
2	Priya	23	87.67
3	Varun	22	56.00
4	Keisha	23	97.00

Create a DataFrame from List of a Tuple¶

In [7]:

data = [('Robin',26,45.34),('Karan',25,78.5),('Priya',23,87.67),('Varun',22,56),('Keisha',23,97)]
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df

Out[7]:

	Name	Age	Marks
0	Robin	26	45.34
1	Karan	25	78.50
2	Priya	23	87.67
3	Varun	22	56.00
4	Keisha	23	97.00

2. Create a DataFrame from Dict¶

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is the array length.

In [8]:

import pandas as pd
data = {'Name':['Ayush', 'Priya', 'Kapil', 'Rohit'],'Age':[28,21,29,42]}
df = pd.DataFrame(data)
df

Out[8]:

	Name	Age
0	Ayush	28
1	Priya	21
2	Kapil	29
3	Rohit	42

adding index¶

In [9]:

df = pd.DataFrame(data,	 index=['i1','i2','i3','i4'])
print(df)

     Name  Age
i1  Ayush   28
i2  Priya   21
i3  Kapil   29
i4  Rohit   42

Create a DataFrame from List of Dicts¶

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

In [10]:

import pandas as pd
data = [{'a': 12, 'b': 32},{'a': 15, 'b': 50, 'c': 23},{'a': 65, 'b': 45, 'c': 19}]
df = pd.DataFrame(data)
df

Out[10]:

	a	b	c
0	12	32	NaN
1	15	50	23.0
2	65	45	19.0

Note− Observe, NaN (Not a Number) is appended in missing areas.¶

In [11]:

df = pd.DataFrame(data, index=['First', 'Second','Third'])
df

Out[11]:

	a	b	c
First	12	32	NaN
Second	15	50	23.0
Third	65	45	19.0

With two column indices, values same as dictionary keys¶

In [12]:

df1 = pd.DataFrame(data, index=['First', 'Second','Third'], columns=['a', 'b'])
df1

Out[12]:

	a	b
First	12	32
Second	15	50
Third	65	45

3. Create a DataFrame from Dict of series¶

A DataFrame can be created by passing a Dictionary of Series. The union of all the series indexes passed, is the resultant index.

In [13]:

import pandas as pd
data={'Col1': pd.Series([1,5,2,5,6],index=['a','b','c','d','e']), 'Col2': pd.Series([25,87,52,65,89],index=['a','b','c','d','e']) }
df=pd.DataFrame(data)
df

Out[13]:

	Col1	Col2
a	1	25
b	5	87
c	2	52
d	5	65
e	6	89

4. Create a DataFrame from Numpy Array¶

In [14]:

data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)

[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]

In [15]:

import numpy as np
Arr = np.array(data)
Arr

Out[15]:

array([['Robin', '26', '45.34'],
       ['Karan', '25', '78.5'],
       ['Priya', '23', '87.67'],
       ['Varun', '22', '56'],
       ['Keisha', '23', '97']], dtype='<U32')

In [16]:

df = pd.DataFrame(Arr)
df

Out[16]:

	0	1	2
0	Robin	26	45.34
1	Karan	25	78.5
2	Priya	23	87.67
3	Varun	22	56
4	Keisha	23	97

In [17]:

df = pd.DataFrame(Arr,columns = ['Name','Age','Marks'])
df

Out[17]:

	Name	Age	Marks
0	Robin	26	45.34
1	Karan	25	78.5
2	Priya	23	87.67
3	Varun	22	56
4	Keisha	23	97

5. Column Selection, Additon & Deletion¶

Selection¶

In [18]:

data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)

[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]

In [19]:

df = pd.DataFrame(data,columns = ['Name','Age','Marks1'])
df

Out[19]:

	Name	Age	Marks1
0	Robin	26	45.34
1	Karan	25	78.50
2	Priya	23	87.67
3	Varun	22	56.00
4	Keisha	23	97.00

Select a Single Column¶

In [20]:

df['Name']

Out[20]:

0     Robin
1     Karan
2     Priya
3     Varun
4    Keisha
Name: Name, dtype: object

or¶

In [21]:

df.Name

Out[21]:

0     Robin
1     Karan
2     Priya
3     Varun
4    Keisha
Name: Name, dtype: object

Addition¶

In [22]:

df

Out[22]:

	Name	Age	Marks1
0	Robin	26	45.34
1	Karan	25	78.50
2	Priya	23	87.67
3	Varun	22	56.00
4	Keisha	23	97.00

In [23]:

df['Marks2'] = [78,56,98,45,66]

In [24]:

df['Roll No'] = [10,11,12,13,14]

In [25]:

df

Out[25]:

	Name	Age	Marks1	Marks2	Roll No
0	Robin	26	45.34	78	10
1	Karan	25	78.50	56	11
2	Priya	23	87.67	98	12
3	Varun	22	56.00	45	13
4	Keisha	23	97.00	66	14

adding new column by adding values of column first and third¶

In [26]:

df['Total Marks']=df['Marks1']+df['Marks2']
df

Out[26]:

	Name	Age	Marks1	Marks2	Roll No	Total Marks
0	Robin	26	45.34	78	10	123.34
1	Karan	25	78.50	56	11	134.50
2	Priya	23	87.67	98	12	185.67
3	Varun	22	56.00	45	13	101.00
4	Keisha	23	97.00	66	14	163.00

Deletion¶

deleting column using del function.

In [27]:

del df['Roll No']
df

Out[27]:

	Name	Age	Marks1	Marks2	Total Marks
0	Robin	26	45.34	78	123.34
1	Karan	25	78.50	56	134.50
2	Priya	23	87.67	98	185.67
3	Varun	22	56.00	45	101.00
4	Keisha	23	97.00	66	163.00

deleting column using pop function.¶

In [28]:

df.pop('Age')
df

Out[28]:

	Name	Marks1	Marks2	Total Marks
0	Robin	45.34	78	123.34
1	Karan	78.50	56	134.50
2	Priya	87.67	98	185.67
3	Varun	56.00	45	101.00
4	Keisha	97.00	66	163.00

Deletion of rows can be done by using drop() function.¶

In [29]:

df

Out[29]:

	Name	Marks1	Marks2	Total Marks
0	Robin	45.34	78	123.34
1	Karan	78.50	56	134.50
2	Priya	87.67	98	185.67
3	Varun	56.00	45	101.00
4	Keisha	97.00	66	163.00

In [30]:

df = df.drop(0)
df

Out[30]:

	Name	Marks1	Marks2	Total Marks
1	Karan	78.50	56	134.50
2	Priya	87.67	98	185.67
3	Varun	56.00	45	101.00
4	Keisha	97.00	66	163.00