Pands String Operations¶
Pandas provides a set of string functions which make it easy to operate on string data. Most importantly, these functions ignore (or exclude) missing/NaN values.
- lower(): Converts strings in the Series/Index to lower case.
- upper(): Converts strings in the Series/Index to upper case.
- len(): Computes String length().
- strip(): Helps strip whitespace(including newline) from each string in the Series/index from both the sides.
- split(‘ ‘): Splits each string with the given pattern.
- cat(sep=’ ‘): Concatenates the series/index elements with given separator.
- get_dummies(): Returns the DataFrame with One-Hot Encoded values.
- contains(pattern): Returns a Boolean value True for each element if the substring contains in the element, else False.
- replace(a,b): Replaces the value a with the value b.
- repeat(value): Repeats each element with specified number of times.
- count(pattern): Returns count of appearance of pattern in each element.
- startswith(pattern): Returns true if the element in the Series/Index starts with the pattern.
- endswith(pattern): Returns true if the element in the Series/Index ends with the pattern.
- find(pattern): Returns the first position of the first occurrence of the pattern.
- findall(pattern): Returns a list of all occurrence of the pattern.
- swapcase: Swaps the case lower/upper.
- islower(): Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean
- isupper(): Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean.
- isnumeric(): Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean.
In [1]:
import pandas as pd
import numpy as np
s = pd.Series(['Karan', 'Priya', 'Keish@', 'Atharv','12345','Varun','Chetna','MANAV','ishita'])
s
Out[1]:
0 Karan 1 Priya 2 Keish@ 3 Atharv 4 12345 5 Varun 6 Chetna 7 MANAV 8 ishita dtype: object
String Lowercase¶
In [2]:
s.str.lower()
Out[2]:
0 karan 1 priya 2 keish@ 3 atharv 4 12345 5 varun 6 chetna 7 manav 8 ishita dtype: object
String Uppercase¶
In [3]:
s.str.upper()
Out[3]:
0 KARAN 1 PRIYA 2 KEISH@ 3 ATHARV 4 12345 5 VARUN 6 CHETNA 7 MANAV 8 ISHITA dtype: object
String Length¶
In [4]:
s.str.len()
Out[4]:
0 5 1 5 2 6 3 6 4 5 5 5 6 6 7 5 8 6 dtype: int64
String Concatenates¶
In [5]:
s.str.cat(sep='_')
Out[5]:
'Karan_Priya_Keish@_Atharv_12345_Varun_Chetna_MANAV_ishita'
String Contains¶
In [6]:
s[s.str.contains('n')]
Out[6]:
0 Karan 5 Varun 6 Chetna dtype: object
String Replace¶
In [7]:
s.str.replace('a','@')
Out[7]:
0 K@r@n 1 Priy@ 2 Keish@ 3 Ath@rv 4 12345 5 V@run 6 Chetn@ 7 MANAV 8 ishit@ dtype: object
String Repeat¶
In [8]:
s.str.repeat(2)
Out[8]:
0 KaranKaran 1 PriyaPriya 2 Keish@Keish@ 3 AtharvAtharv 4 1234512345 5 VarunVarun 6 ChetnaChetna 7 MANAVMANAV 8 ishitaishita dtype: object
String Count¶
In [9]:
s.str.count('a')
Out[9]:
0 2 1 1 2 0 3 1 4 0 5 1 6 1 7 0 8 1 dtype: int64
String Startswith¶
In [10]:
s[s.str.startswith('K')]
Out[10]:
0 Karan 2 Keish@ dtype: object
String Endswith¶
In [11]:
s[s.str.endswith('a')]
Out[11]:
1 Priya 6 Chetna 8 ishita dtype: object
String find Index¶
In [12]:
s.str.find('a')
Out[12]:
0 1 1 4 2 -1 3 3 4 -1 5 1 6 5 7 -1 8 5 dtype: int64
String findall¶
In [13]:
s.str.findall('a')
Out[13]:
0 [a, a] 1 [a] 2 [] 3 [a] 4 [] 5 [a] 6 [a] 7 [] 8 [a] dtype: object
String Swapcase()¶
In [14]:
s.str.swapcase()
Out[14]:
0 kARAN 1 pRIYA 2 kEISH@ 3 aTHARV 4 12345 5 vARUN 6 cHETNA 7 manav 8 ISHITA dtype: object
String Islower¶
In [15]:
s[s.str.islower()]
Out[15]:
8 ishita dtype: object
String Isupper¶
In [16]:
s[s.str.isupper()]
Out[16]:
7 MANAV dtype: object
String Isnumeric¶
In [17]:
s[s.str.isnumeric()]
Out[17]:
4 12345 dtype: object
String Strip¶
In [18]:
S = pd.Series([' Welcome to Python Tutorials '])
S
Out[18]:
0 Welcome to Python Tutorials dtype: object
In [19]:
S.str.strip()
Out[19]:
0 Welcome to Python Tutorials dtype: object
In [20]:
S.str.strip(' ')
Out[20]:
0 Welcome to Python Tutorials dtype: object
In [21]:
S = pd.Series(['Welcome-to-Python-Tutorials'])
S
Out[21]:
0 Welcome-to-Python-Tutorials dtype: object
In [22]:
S.str.strip('-')
Out[22]:
0 Welcome-to-Python-Tutorials dtype: object
String Split¶
In [23]:
S = pd.Series(['Welcome to Python Tutorials'])
S
Out[23]:
0 Welcome to Python Tutorials dtype: object
In [24]:
S.str.split()
Out[24]:
0 [Welcome, to, Python, Tutorials] dtype: object
In [25]:
S.str.split(' ')
Out[25]:
0 [Welcome, to, Python, Tutorials] dtype: object
In [26]:
S = pd.Series(['Welcome-to-Python-Tutorials'])
S
Out[26]:
0 Welcome-to-Python-Tutorials dtype: object
In [27]:
S.str.split('-')
Out[27]:
0 [Welcome, to, Python, Tutorials] dtype: object
String Dummies¶
In [28]:
s = pd.Series(list('abca'))
s
Out[28]:
0 a 1 b 2 c 3 a dtype: object
In [29]:
pd.get_dummies(s)
Out[29]:
a | b | c | |
---|---|---|---|
0 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
2 | 0 | 0 | 1 |
3 | 1 | 0 | 0 |