A stemming algorithm reduces the words “chocolates”, “chocolatey”, “choco” to the root word, “chocolate” and “retrieval”, “retrieved”, “retrieves” reduce to the stem “retrieve”.¶
Applications of stemming are:¶
- Stemming is used in information retrieval systems like search engines.
- It is used to determine domain vocabularies in domain analysis.
In [1]:
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
port=PorterStemmer()
words = ["program", "programs", "programer", "programing", "programers"]
for w in words:
print(port.stem(w))
text="It is very important to be pythonly while you are pythoning with python. All pythoners have pythoned poorly at least once."
words=word_tokenize(text)
for w in words:
print(port.stem(w))
program program program program program It is veri import to be pythonli while you are python with python . all python have python poorli at least onc .
In [10]:
#Stemming
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
port=PorterStemmer()
sample="He eats what he was eating yesterday at the eatery"
for word in word_tokenize(sample):
print(port.stem(word))
He eat what he wa eat yesterday at the eateri
In [11]:
from nltk.stem import LancasterStemmer
lstemmer=LancasterStemmer()
lstemmer.stem('beautiful')
Out[11]:
'beauty'
In [12]:
from nltk.stem import RegexpStemmer
rstemmer = RegexpStemmer('ing')
rstemmer.stem('skipping')
Out[12]:
'skipp'
In [14]:
rstemmer = RegexpStemmer('ing')
print(rstemmer.stem('King'))
print(lstemmer.stem('King'))
print(port.stem('King'))
K king king
In [ ]:
In [ ]:
In [ ]: