Hierarchical Indexing¶
Up to this point we’ve been focused primarily on one-dimensional and two-dimensional data, stored in Pandas Series and DataFrame objects, respectively. Often it is useful to go beyond this and store higher-dimensional data—that is, data indexed by more than one or two keys. While Pandas does provide Panel and Panel4D objectsthat natively handle three-dimensional and four-dimensional data, a far more common pattern in practice is to make use of hierarchical indexing (also known as multi-indexing) to incorporate multiple index levels within a single index
#Importing libraries
import pandas as pd
import numpy as np
Let’s generate random data from the normal distribution.
data=pd.Series(np.random.randn(8),index=[["a","a","a","b","b","b","c","c"],[1,2,3,1,2,3,1,2]])
data
a 1 -0.799629 2 1.449937 3 1.772006 b 1 0.703102 2 0.631890 3 -1.971740 c 1 -1.493603 2 0.419737 dtype: float64
What is MultiIndex?¶
MultiIndex allows you to select more than one row and column in your index. To understand MultiIndex, let’s see the indexes of the data.
data.index
MultiIndex([('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2)], )
MultiIndex is an advanced indexing technique for DataFrames that shows the multiple levels of the indexes. Our dataset has two levels. You can obtain subsets of the data using the indexes. For example, let’s take a look at the values with index a.
data["a"]
1 -0.799629 2 1.449937 3 1.772006 dtype: float64
#slicing can also be done on multiindexes
data["b":"c"]
b 1 0.703102 2 0.631890 3 -1.971740 c 1 -1.493603 2 0.419737 dtype: float64
#We can also look more than one index
data.loc[["a","c"]]
a 1 -0.799629 2 1.449937 3 1.772006 c 1 -1.493603 2 0.419737 dtype: float64
You can select values from the inner index. Let’s take a look at the first values of the inner index.
data.loc[:,1]
a -0.799629 b 0.703102 c -1.493603 dtype: float64
What is the unstack?¶
The stack method turns column names into index values, and the unstack method turns index values into column names. You can see the data as a table with the unstack method
data.unstack()
1 | 2 | 3 | |
---|---|---|---|
a | -0.799629 | 1.449937 | 1.772006 |
b | 0.703102 | 0.631890 | -1.971740 |
c | -1.493603 | 0.419737 | NaN |
To restore the dataset, you can use the stack method.
data.unstack().stack()
a 1 -0.799629 2 1.449937 3 1.772006 b 1 0.703102 2 0.631890 3 -1.971740 c 1 -1.493603 2 0.419737 dtype: float64
Hierarchical Indexing in The Data Frame¶
You can move the DataFrame’s columns to the row index. To show this, let’s create a dataset.
data=pd.DataFrame({"x":range(8),"y":range(8,0,-1),"a":["one","one","one","one","two","two","two","two"],"b":[0,1,2,3,0,1,2,3]})
data
x | y | a | b | |
---|---|---|---|---|
0 | 0 | 8 | one | 0 |
1 | 1 | 7 | one | 1 |
2 | 2 | 6 | one | 2 |
3 | 3 | 5 | one | 3 |
4 | 4 | 4 | two | 0 |
5 | 5 | 3 | two | 1 |
6 | 6 | 2 | two | 2 |
7 | 7 | 1 | two | 3 |
Let’s transform columns a and b of this dataset into a row index.
data2=data.set_index(["a","b"])
data2
x | y | ||
---|---|---|---|
a | b | ||
one | 0 | 0 | 8 |
1 | 1 | 7 | |
2 | 2 | 6 | |
3 | 3 | 5 | |
two | 0 | 4 | 4 |
1 | 5 | 3 | |
2 | 6 | 2 | |
3 | 7 | 1 |
In the set_index method, the indexes moved to the row are removed from the column. You can use drop = False to remain the columns you get as an index in the same place.
data3=data.set_index(["a","b"],drop=False)
data3
x | y | a | b | ||
---|---|---|---|---|---|
a | b | ||||
one | 0 | 0 | 8 | one | 0 |
1 | 1 | 7 | one | 1 | |
2 | 2 | 6 | one | 2 | |
3 | 3 | 5 | one | 3 | |
two | 0 | 4 | 4 | two | 0 |
1 | 5 | 3 | two | 1 | |
2 | 6 | 2 | two | 2 | |
3 | 7 | 1 | two | 3 |
data2
x | y | ||
---|---|---|---|
a | b | ||
one | 0 | 0 | 8 |
1 | 1 | 7 | |
2 | 2 | 6 | |
3 | 3 | 5 | |
two | 0 | 4 | 4 |
1 | 5 | 3 | |
2 | 6 | 2 | |
3 | 7 | 1 |
You can use the reset_index method to restore the dataset.
data2.reset_index()
a | b | x | y | |
---|---|---|---|---|
0 | one | 0 | 0 | 8 |
1 | one | 1 | 1 | 7 |
2 | one | 2 | 2 | 6 |
3 | one | 3 | 3 | 5 |
4 | two | 0 | 4 | 4 |
5 | two | 1 | 5 | 3 |
6 | two | 2 | 6 | 2 |
7 | two | 3 | 7 | 1 |