DataFrames¶
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of DataFrame
- Potentially columns are of different types
- Size : Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns
A pandas DataFrame can be created using the following constructor :
- pandas.DataFrame( data, index, columns, dtype, copy)
Create DataFrame: A pandas DataFrame can be created using various inputs like :
- Lists, 2D List and List of a Tuple
- dict
- Series
- Numpy ndarrays
Empty DataFrame¶
import pandas as pd
df=pd.DataFrame()
print(df)
Empty DataFrame Columns: [] Index: []
1. Create a DataFrame from List¶
The DataFrame can be created using a single list or a list of lists.
Series vs DataFrame¶
Using Series¶
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.Series(data)
Ser
0 1 1 2 2 5 3 4 4 6 dtype: int64
Using DataFrame¶
import pandas as pd
data= [1,2,5,4,6]
Ser=pd.DataFrame(data)
Ser
0 | |
---|---|
0 | 1 |
1 | 2 |
2 | 5 |
3 | 4 |
4 | 6 |
Note: As you can seen in Series there is no column name but in DataFrame there is a default column starting from Zero(0)
Create a DataFrame From 2D List¶
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
df=pd.DataFrame(data)
df
0 | 1 | 2 | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.50 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56.00 |
4 | Keisha | 23 | 97.00 |
adding column names¶
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
Name | Age | Marks | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.50 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56.00 |
4 | Keisha | 23 | 97.00 |
Create a DataFrame from List of a Tuple¶
data = [('Robin',26,45.34),('Karan',25,78.5),('Priya',23,87.67),('Varun',22,56),('Keisha',23,97)]
df=pd.DataFrame(data,columns=['Name','Age','Marks'])
df
Name | Age | Marks | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.50 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56.00 |
4 | Keisha | 23 | 97.00 |
2. Create a DataFrame from Dict¶
All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.
import pandas as pd
data = {'Name':['Ayush', 'Priya', 'Kapil', 'Rohit'],'Age':[28,21,29,42]}
df = pd.DataFrame(data)
df
Name | Age | |
---|---|---|
0 | Ayush | 28 |
1 | Priya | 21 |
2 | Kapil | 29 |
3 | Rohit | 42 |
adding index¶
df = pd.DataFrame(data, index=['i1','i2','i3','i4'])
print(df)
Name Age i1 Ayush 28 i2 Priya 21 i3 Kapil 29 i4 Rohit 42
Create a DataFrame from List of Dicts¶
List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.
import pandas as pd
data = [{'a': 12, 'b': 32},{'a': 15, 'b': 50, 'c': 23},{'a': 65, 'b': 45, 'c': 19}]
df = pd.DataFrame(data)
df
a | b | c | |
---|---|---|---|
0 | 12 | 32 | NaN |
1 | 15 | 50 | 23.0 |
2 | 65 | 45 | 19.0 |
Note− Observe, NaN (Not a Number) is appended in missing areas.¶
df = pd.DataFrame(data, index=['First', 'Second','Third'])
df
a | b | c | |
---|---|---|---|
First | 12 | 32 | NaN |
Second | 15 | 50 | 23.0 |
Third | 65 | 45 | 19.0 |
With two column indices, values same as dictionary keys¶
df1 = pd.DataFrame(data, index=['First', 'Second','Third'], columns=['a', 'b'])
df1
a | b | |
---|---|---|
First | 12 | 32 |
Second | 15 | 50 |
Third | 65 | 45 |
3. Create a DataFrame from Dict of series¶
A DataFrame can be created by passing a Dictionary of Series. The union of all the series indexes passed, is the resultant index.
import pandas as pd
data={'Col1': pd.Series([1,5,2,5,6],index=['a','b','c','d','e']), 'Col2': pd.Series([25,87,52,65,89],index=['a','b','c','d','e']) }
df=pd.DataFrame(data)
df
Col1 | Col2 | |
---|---|---|
a | 1 | 25 |
b | 5 | 87 |
c | 2 | 52 |
d | 5 | 65 |
e | 6 | 89 |
4. Create a DataFrame from Numpy Array¶
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
import numpy as np
Arr = np.array(data)
Arr
array([['Robin', '26', '45.34'], ['Karan', '25', '78.5'], ['Priya', '23', '87.67'], ['Varun', '22', '56'], ['Keisha', '23', '97']], dtype='<U32')
df = pd.DataFrame(Arr)
df
0 | 1 | 2 | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.5 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56 |
4 | Keisha | 23 | 97 |
df = pd.DataFrame(Arr,columns = ['Name','Age','Marks'])
df
Name | Age | Marks | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.5 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56 |
4 | Keisha | 23 | 97 |
5. Column Selection, Additon & Deletion¶
Selection¶
data = [['Robin',26,45.34],['Karan',25,78.5],['Priya',23,87.67],['Varun',22,56],['Keisha',23,97]]
print(data)
[['Robin', 26, 45.34], ['Karan', 25, 78.5], ['Priya', 23, 87.67], ['Varun', 22, 56], ['Keisha', 23, 97]]
df = pd.DataFrame(data,columns = ['Name','Age','Marks1'])
df
Name | Age | Marks1 | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.50 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56.00 |
4 | Keisha | 23 | 97.00 |
Select a Single Column¶
df['Name']
0 Robin 1 Karan 2 Priya 3 Varun 4 Keisha Name: Name, dtype: object
or¶
df.Name
0 Robin 1 Karan 2 Priya 3 Varun 4 Keisha Name: Name, dtype: object
Addition¶
df
Name | Age | Marks1 | |
---|---|---|---|
0 | Robin | 26 | 45.34 |
1 | Karan | 25 | 78.50 |
2 | Priya | 23 | 87.67 |
3 | Varun | 22 | 56.00 |
4 | Keisha | 23 | 97.00 |
df['Marks2'] = [78,56,98,45,66]
df['Roll No'] = [10,11,12,13,14]
df
Name | Age | Marks1 | Marks2 | Roll No | |
---|---|---|---|---|---|
0 | Robin | 26 | 45.34 | 78 | 10 |
1 | Karan | 25 | 78.50 | 56 | 11 |
2 | Priya | 23 | 87.67 | 98 | 12 |
3 | Varun | 22 | 56.00 | 45 | 13 |
4 | Keisha | 23 | 97.00 | 66 | 14 |
adding new column by adding values of column first and third¶
df['Total Marks']=df['Marks1']+df['Marks2']
df
Name | Age | Marks1 | Marks2 | Roll No | Total Marks | |
---|---|---|---|---|---|---|
0 | Robin | 26 | 45.34 | 78 | 10 | 123.34 |
1 | Karan | 25 | 78.50 | 56 | 11 | 134.50 |
2 | Priya | 23 | 87.67 | 98 | 12 | 185.67 |
3 | Varun | 22 | 56.00 | 45 | 13 | 101.00 |
4 | Keisha | 23 | 97.00 | 66 | 14 | 163.00 |
Deletion¶
- deleting column using del function.
del df['Roll No']
df
Name | Age | Marks1 | Marks2 | Total Marks | |
---|---|---|---|---|---|
0 | Robin | 26 | 45.34 | 78 | 123.34 |
1 | Karan | 25 | 78.50 | 56 | 134.50 |
2 | Priya | 23 | 87.67 | 98 | 185.67 |
3 | Varun | 22 | 56.00 | 45 | 101.00 |
4 | Keisha | 23 | 97.00 | 66 | 163.00 |
deleting column using pop function.¶
df.pop('Age')
df
Name | Marks1 | Marks2 | Total Marks | |
---|---|---|---|---|
0 | Robin | 45.34 | 78 | 123.34 |
1 | Karan | 78.50 | 56 | 134.50 |
2 | Priya | 87.67 | 98 | 185.67 |
3 | Varun | 56.00 | 45 | 101.00 |
4 | Keisha | 97.00 | 66 | 163.00 |
Deletion of rows can be done by using drop() function.¶
df
Name | Marks1 | Marks2 | Total Marks | |
---|---|---|---|---|
0 | Robin | 45.34 | 78 | 123.34 |
1 | Karan | 78.50 | 56 | 134.50 |
2 | Priya | 87.67 | 98 | 185.67 |
3 | Varun | 56.00 | 45 | 101.00 |
4 | Keisha | 97.00 | 66 | 163.00 |
df = df.drop(0)
df
Name | Marks1 | Marks2 | Total Marks | |
---|---|---|---|---|
1 | Karan | 78.50 | 56 | 134.50 |
2 | Priya | 87.67 | 98 | 185.67 |
3 | Varun | 56.00 | 45 | 101.00 |
4 | Keisha | 97.00 | 66 | 163.00 |