Flipkart Reviews Sentiment Analysis using Python

Flipkart is one of the most popular Indian companies. It is an e-commerce platform that competes with popular e-commerce platforms like Amazon. One of the most popular use cases of data science is the task of sentiment analysis of product reviews sold on e-commerce platforms.

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

data = pd.read_csv("flipkart_reviews.csv")
data.head()
Out[2]:
Product_name Review Rating
0 Lenovo Ideapad Gaming 3 Ryzen 5 Hexa Core 5600... Best under 60k Great performanceI got it for a... 5
1 Lenovo Ideapad Gaming 3 Ryzen 5 Hexa Core 5600... Good perfomence... 5
2 Lenovo Ideapad Gaming 3 Ryzen 5 Hexa Core 5600... Great performance but usually it has also that... 5
3 DELL Inspiron Athlon Dual Core 3050U - (4 GB/2... My wife is so happy and best product πŸ‘ŒπŸ»πŸ˜˜ 5
4 DELL Inspiron Athlon Dual Core 3050U - (4 GB/2... Light weight laptop with new amazing features,... 5

This dataset contains only three columns. Let’s have a look at whether any of these columns contains missing values or not:

In [3]:
data.isnull().sum()
Out[3]:
Product_name    0
Review          0
Rating          0
dtype: int64

So the dataset does not have any null values. As this is the task of sentiment analysis of Flipkart reviews, I will clean and prepare the column containing reviews before heading to sentiment analysis:

In [4]:
import nltk
import re
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["Review"] = data["Review"].apply(clean)

The Rating column of the data contains the ratings given by every reviewer. So let’s have a look at how most of the people rate the products they buy from Flipkart:

In [5]:
ratings = data["Rating"].value_counts()
numbers = ratings.index
quantity = ratings.values

import plotly.express as px
figure = px.pie(data, values=quantity, names=numbers,hole = 0.5)
figure.show()

So 60% of the reviewers have given 5 out of 5 ratings to the products they buy from Flipkart. Now let’s have a look at the kind of reviews people leave. For this, I will use a word cloud to visualize the most used words in the reviews column:

In [6]:
text = " ".join(i for i in data.Review)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, 
                      background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Now I will analyze the sentiments of Flipkart reviews by adding three columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the reviews:

In [7]:
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Review"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Review"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Review"]]
data = data[["Review", "Positive", "Negative", "Neutral"]]
data.head()
Out[7]:
Review Positive Negative Neutral
0 best great performancei got around backup bi... 0.395 0.101 0.504
1 good perfom 0.744 0.000 0.256
2 great perform usual also game laptop issu batt... 0.277 0.000 0.723
3 wife happi best product πŸ‘ŒπŸ»πŸ˜˜ 0.512 0.000 0.488
4 light weight laptop new amaz featur batteri li... 0.000 0.000 1.000

Now let’s see how most of the reviewers think about the products and services of Flipkart:

In [8]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral πŸ™‚ ")
sentiment_score(x, y, z)
Neutral πŸ™‚ 

So most of the reviews are neutral. Let’s have a look at the total of Positive, Negative, and Neutral sentiment scores to find a conclusion about Flipkart reviews:

In [9]:
print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)
Positive:  923.5529999999985
Negative:  96.77500000000013
Neutral:  1283.6880000000006

So, most people give Neutral reviews, and a small proportion of people give Negative reviews. So we can say that people are satisfied with Flipkart products and services.