import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

!pip install pydrive

Collecting pydrive
  Downloading https://files.pythonhosted.org/packages/52/e0/0e64788e5dd58ce2d6934549676243dc69d982f198524be9b99e9c2a4fd5/PyDrive-1.3.1.tar.gz (987kB)
Collecting google-api-python-client>=1.2 (from pydrive)
  Downloading https://files.pythonhosted.org/packages/3f/f1/20fd18744c3d20307d634ffcc02592bc7efc45a59624e14655cf21cbfb5e/google_api_python_client-1.7.9-py3-none-any.whl (56kB)
Collecting oauth2client>=4.0.0 (from pydrive)
  Downloading https://files.pythonhosted.org/packages/95/a9/4f25a14d23f0786b64875b91784607c2277eff25d48f915e39ff0cff505a/oauth2client-4.1.3-py2.py3-none-any.whl (98kB)
Requirement already satisfied: PyYAML>=3.0 in c:\users\karan\appdata\local\programs\python\python36\lib\site-packages (from pydrive) (3.13)
Collecting google-auth-httplib2>=0.0.3 (from google-api-python-client>=1.2->pydrive)
  Downloading https://files.pythonhosted.org/packages/33/49/c814d6d438b823441552198f096fcd0377fd6c88714dbed34f1d3c8c4389/google_auth_httplib2-0.0.3-py2.py3-none-any.whl
Collecting google-auth>=1.4.1 (from google-api-python-client>=1.2->pydrive)
  Downloading https://files.pythonhosted.org/packages/c5/9b/ed0516cc1f7609fb0217e3057ff4f0f9f3e3ce79a369c6af4a6c5ca25664/google_auth-1.6.3-py2.py3-none-any.whl (73kB)
Requirement already satisfied: httplib2<1dev,>=0.9.2 in c:\users\karan\appdata\local\programs\python\python36\lib\site-packages (from google-api-python-client>=1.2->pydrive) (0.12.1)
Collecting uritemplate<4dev,>=3.0.0 (from google-api-python-client>=1.2->pydrive)
  Downloading https://files.pythonhosted.org/packages/e5/7d/9d5a640c4f8bf2c8b1afc015e9a9d8de32e13c9016dcc4b0ec03481fb396/uritemplate-3.0.0-py2.py3-none-any.whl
Requirement already satisfied: six<2dev,>=1.6.1 in c:\users\karan\appdata\local\programs\python\python36\lib\site-packages (from google-api-python-client>=1.2->pydrive) (1.12.0)
Collecting pyasn1>=0.1.7 (from oauth2client>=4.0.0->pydrive)
  Downloading https://files.pythonhosted.org/packages/7b/7c/c9386b82a25115cccf1903441bba3cbadcfae7b678a20167347fa8ded34c/pyasn1-0.4.5-py2.py3-none-any.whl (73kB)
Collecting rsa>=3.1.4 (from oauth2client>=4.0.0->pydrive)
  Downloading https://files.pythonhosted.org/packages/02/e5/38518af393f7c214357079ce67a317307936896e961e35450b70fad2a9cf/rsa-4.0-py2.py3-none-any.whl
Collecting pyasn1-modules>=0.0.5 (from oauth2client>=4.0.0->pydrive)
  Downloading https://files.pythonhosted.org/packages/91/f0/b03e00ce9fddf4827c42df1c3ce10c74eadebfb706231e8d6d1c356a4062/pyasn1_modules-0.2.5-py2.py3-none-any.whl (74kB)
Collecting cachetools>=2.0.0 (from google-auth>=1.4.1->google-api-python-client>=1.2->pydrive)
  Downloading https://files.pythonhosted.org/packages/2f/a6/30b0a0bef12283e83e58c1d6e7b5aabc7acfc4110df81a4471655d33e704/cachetools-3.1.1-py2.py3-none-any.whl
Building wheels for collected packages: pydrive
  Building wheel for pydrive (setup.py): started
  Building wheel for pydrive (setup.py): finished with status 'done'
  Stored in directory: C:\Users\Karan\AppData\Local\pip\Cache\wheels\fa\d2\9a\d3b6b506c2da98289e5d417215ce34b696db856643bad779f4
Successfully built pydrive
Installing collected packages: pyasn1, pyasn1-modules, rsa, cachetools, google-auth, google-auth-httplib2, uritemplate, google-api-python-client, oauth2client, pydrive
Successfully installed cachetools-3.1.1 google-api-python-client-1.7.9 google-auth-1.6.3 google-auth-httplib2-0.0.3 oauth2client-4.1.3 pyasn1-0.4.5 pyasn1-modules-0.2.5 pydrive-1.3.1 rsa-4.0 uritemplate-3.0.0

You are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
 in 
      1 from pydrive.auth import GoogleAuth
      2 from pydrive.drive import GoogleDrive
----> 3 from google.colab import auth
      4 from oauth2client.client import GoogleCredentials
      5 # Authenticate and create the PyDrive client.

ModuleNotFoundError: No module named 'google.colab'

link='https://drive.google.com/open?id=1tmzZKQKEvxt61TxjHchFfJkpqklVgdzP'
fluff,id=link.split('=')
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('airline.csv')  
airline_data = pd.read_csv('airline.csv')

airline_data.head(1)

round((airline_data.isna().sum())/len(airline_data),2)

airline_name                     0.00
link                             0.00
title                            0.00
author                           0.00
author_country                   0.04
date                             0.00
content                          0.00
aircraft                         0.97
type_traveller                   0.94
cabin_flown                      0.07
route                            0.94
overall_rating                   0.11
seat_comfort_rating              0.19
cabin_staff_rating               0.19
food_beverages_rating            0.20
inflight_entertainment_rating    0.25
ground_service_rating            0.95
wifi_connectivity_rating         0.99
value_money_rating               0.04
recommended                      0.00
dtype: float64

airline_data=airline_data[airline_data['overall_rating'].notnull()]

round((airline_data.isna().sum())/len(airline_data),2)

airline_name                     0.00
link                             0.00
title                            0.00
author                           0.00
author_country                   0.02
date                             0.00
content                          0.00
aircraft                         0.97
type_traveller                   0.94
cabin_flown                      0.05
route                            0.94
overall_rating                   0.00
seat_comfort_rating              0.17
cabin_staff_rating               0.17
food_beverages_rating            0.18
inflight_entertainment_rating    0.23
ground_service_rating            0.94
wifi_connectivity_rating         0.98
value_money_rating               0.03
recommended                      0.00
dtype: float64

airline_names=airline_data.airline_name.unique()
print('Total airlines Considered for Analysis : ',len(airline_names))
total_reviews_each_airline=[]
for i in airline_names:
  temp=airline_data[airline_data.airline_name==i]
  total_reviews_each_airline.append(len(temp))
result=list(zip(airline_names,total_reviews_each_airline))
print('Total Reviews Analysed : ',sum(total_reviews_each_airline))

Total airlines Considered for Analysis :  357
Total Reviews Analysed :  36861

df=pd.DataFrame(result,columns=['Airline_Name','Total_Reviews'])
df=df.sort_values(by='Total_Reviews',ascending=False)
df.head()

plt.style.use('seaborn')
plt.xlabel('Airlines')
plt.ylabel('Reviews')
plt.bar(df.Airline_Name[:5],df.Total_Reviews[:5],label='Top 5 Airlines')
plt.legend()

from textblob import TextBlob

result1=[]
num=1
numlist=[]
for review in airline_data.content:
    analysis=TextBlob(review)
    result1.append(analysis.polarity)
    numlist.append(num)
    num=num+1
result1=np.array(result1)

plt.style.use('seaborn')
plt.scatter(numlist,result1,label='Polarity')
plt.xlabel('Reviews')
plt.ylabel('Polarity')
plt.legend()

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

text=''

for i in airline_data.content:
  text=text+i+' '

wordcloud = WordCloud().generate(text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

from PIL import Image
import requests
from io import BytesIO
response = requests.get("http://www.pngmart.com/files/7/Modern-Plane-PNG-HD.png")

mask = np.array(Image.open(BytesIO(response.content)))
wordcloud_fra = WordCloud(background_color="white", mode="RGBA", max_words=1000, mask=mask).generate(text)

# create coloring from image
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=[16,16])
plt.imshow(wordcloud_fra.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")

(-0.5, 2718.5, 944.5, -0.5)

sentiment=[]
for i in airline_data.overall_rating:
  if(i<=5 and i>=0):
    sentiment.append(0)
  else:
    sentiment.append(1)

sentiment=[]
for i in result1:
  if (i>0):
    sentiment.append(1)
  elif (i<0):
    sentiment.append(-1)
  elif (i==0):
    sentiment.append(0)

new_airline_data=airline_data.copy()

new_airline_data['sentiment']=sentiment

### Shuffline the Dataset for Training ###

from sklearn.utils import shuffle
new_airline_data=shuffle(new_airline_data)

positive_sentiment_count=new_airline_data[new_airline_data['sentiment']==1]
negative_sentiment_count=new_airline_data[new_airline_data['sentiment']==-1]
neutral_sentiment_count=new_airline_data[new_airline_data['sentiment']==0]

temp_array=[len(positive_sentiment_count),len(negative_sentiment_count),len(neutral_sentiment_count)]
x_axis_labels=['Positive_sentiment','Negative_sentiment','Neutral_sentiment']
plt.bar(x_axis_labels,temp_array,color=('green','red','blue'),width=(0.2,0.2,0.2))

from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer

model=LogisticRegression()
vectorizer=CountVectorizer(ngram_range=(1,2))
x_l=vectorizer.fit_transform(new_airline_data.content.values)

model.fit(x_l[:29488],new_airline_data.sentiment[:29488].values)

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

model.score(x_l[29488:],new_airline_data.sentiment[29488:].values)

0.9178082191780822

new_airline_data.content[0]

"Outbound flight FRA/PRN A319. 2 hours 10 min flight. I thought drinks/snacks for sale but sandwich soft drinks were served complimentary. Inbound flights SKP/LJU/FRA CRJ900. each 1 hour 30 min flight. Skyshop menu was in a seat pocket and drinks/snacks were for sale. All flight crews were friendly. Security check at the Ljubljana airport for transit passengers was chaos however it's possible to go to a gate within 30min."

y_predict=model.predict(x_l[29488:])

x_axis=[]
n=1
for i in range(0,len(new_airline_data[29488:].values)):
  x_axis.append(n)
  n=n+1

plt.scatter(x_axis,new_airline_data.sentiment[29488:],color='red',label='predicted')
plt.scatter(x_axis,y_predict,label='actual')
plt.legend(loc='best')

from sklearn.svm import LinearSVC

clf = LinearSVC(random_state=42, tol=1e-5)

clf.fit(x_l[:29488],new_airline_data.sentiment[:29488])

/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:931: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=42, tol=1e-05,
     verbose=0)

clf.score(x_l[29488:],new_airline_data.sentiment[29488:])

0.9152312491523125

y_predict1=clf.predict(x_l[29488:])

plt.scatter(x_axis,new_airline_data.sentiment[29488:],color='red',label='predicted')
plt.scatter(x_axis,y_predict1,label='actual')
plt.legend()

t=new_airline_data.copy()

pos_sentiment=[]
neg_sentiment=[]
net_sentiment=[]
avg_rating=[]
for i in airline_names:
  tempdf=t[t['airline_name']==i]
  pos=len(tempdf[tempdf['sentiment']==1])
  neg=len(tempdf[tempdf['sentiment']==-1])
  net=len(tempdf[tempdf['sentiment']==0])
  pos_sentiment.append(pos)
  neg_sentiment.append(neg)
  net_sentiment.append(net)
  avg_rate=tempdf.overall_rating.mean()
  avg_rating.append(avg_rate)
clustered_data=pd.DataFrame(list(zip(airline_names,avg_rating,pos_sentiment,net_sentiment,neg_sentiment)),columns=['airline_name','average_rating','pos_sentiment','net_sentiment','neg_sentiment'])

clustered_data=clustered_data.sort_values(by=['pos_sentiment','average_rating'],kind='mergesort',ascending=False)

clustered_data.head()

plt.figure(figsize=(26,26))
plt.bar(clustered_data.airline_name.head(100).values,clustered_data.pos_sentiment.head(100).values,label='Positive Sentiment',color='green')
plt.bar(clustered_data.airline_name.head(100).values,clustered_data.neg_sentiment.head(100).values,label='Negative Sentiment',color='red')
plt.bar(clustered_data.airline_name.head(100).values,clustered_data.net_sentiment.head(100).values,label='Neutral Sentiment',color='black')
plt.xticks(rotation=90)
plt.legend()

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from nltk.stem import WordNetLemmatizer
import string
from nltk.tokenize import word_tokenize

nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

True

nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.

True

stop = set(stopwords.words("english"))

def preprocessing(text):
  le = WordNetLemmatizer()
  words = word_tokenize(text)
  words = [x for x in words if not x in stop]
  words = [le.lemmatize(x) for x in words]
  return " ".join(words)

reviews2 = [preprocessing(x) for x in new_airline_data.content]

new_airline_data['reviews_processed']=reviews2

logic_model=LogisticRegression()
vectorizer_p=CountVectorizer(ngram_range=(1,2))
x_l_1=vectorizer_p.fit_transform(new_airline_data.reviews_processed.values)

logic_model.fit(x_l_1[:29488],new_airline_data.sentiment[:29488].values)

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:460: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

logic_model.score(x_l_1[29488:],new_airline_data.sentiment[29488:].values)

0.9115692391156924

clf1 = LinearSVC(random_state=42, tol=1e-5)

clf1.fit(x_l_1[:29488],new_airline_data.sentiment[:29488])

/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:931: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=42, tol=1e-05,
     verbose=0)

clf1.score(x_l_1[29488:],new_airline_data.sentiment[29488:])

0.9092635290926353

new_airline_data.head(1)

reviews_array=np.array(new_airline_data.content)

sentiment_array=np.array(new_airline_data.sentiment)

from keras.datasets import imdb
from keras.layers import Dense,Conv1D,MaxPool1D,Embedding,Flatten,Dropout,GRU,LSTM
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import OneHotEncoder
from keras.optimizers import Adam
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Using TensorFlow backend.

token=Tokenizer()
token.fit_on_texts(reviews_array) 
vocab_size=len(token.word_index) +1
print(vocab_size)

37678

l = 0
for i in reviews_array:
  l += len(i)
  
avg_length = l/len(reviews_array)
review_training = [x[:int(avg_length)] for x in reviews_array]

# encoded = token.texts_to_sequences(reviews_array)
# l = []
# for i in encoded:
#   l.append(len(i))
# print(l/len(encoded))

padded_docs = pad_sequences(encoded, maxlen=38, padding='post')

n_model1 = Sequential()
n_model1.add(Embedding(37678,64,input_length=38))
n_model1.add(LSTM(64, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=True, return_state=False, go_backwards=False, stateful=False, unroll=False))
n_model1.add(Dropout(0.5))
n_model1.add(LSTM(64,return_sequences=False))
n_model1.add(Dropout(0.5))
n_model1.add(Dense(1,activation="sigmoid"))
n_model1.summary()
n_model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, 38, 64)            2411392   
_________________________________________________________________
lstm_3 (LSTM)                (None, 38, 64)            33024     
_________________________________________________________________
dropout_3 (Dropout)          (None, 38, 64)            0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dropout_4 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 2,477,505
Trainable params: 2,477,505
Non-trainable params: 0
_________________________________________________________________

hist = n_model1.fit(padded_docs,sentiment_array,epochs=5,batch_size=100,validation_split=0.2)

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 29488 samples, validate on 7373 samples
Epoch 1/5
29488/29488 [==============================] - 40s 1ms/step - loss: 0.4716 - acc: 0.7725 - val_loss: 0.4018 - val_acc: 0.8283
Epoch 2/5
29488/29488 [==============================] - 38s 1ms/step - loss: 0.3562 - acc: 0.8549 - val_loss: 0.4038 - val_acc: 0.8204
Epoch 3/5
29488/29488 [==============================] - 37s 1ms/step - loss: 0.2973 - acc: 0.8836 - val_loss: 0.4231 - val_acc: 0.8142
Epoch 4/5
29488/29488 [==============================] - 37s 1ms/step - loss: 0.2439 - acc: 0.9078 - val_loss: 0.4939 - val_acc: 0.8013
Epoch 5/5
29488/29488 [==============================] - 36s 1ms/step - loss: 0.1928 - acc: 0.9301 - val_loss: 0.5099 - val_acc: 0.7991

n_model2 = Sequential()
n_model2.add(Embedding(37678,100,input_length=38))
n_model2.add(Conv1D(filters=64,kernel_size=3))
n_model2.add(MaxPool1D(pool_size=3))
n_model2.add(Flatten())
n_model2.add(Dense(64,activation="relu"))
n_model2.add(Dropout(rate = 0.2))
n_model2.add(Dense(1,activation="sigmoid"))
n_model2.summary()
adm = Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
n_model2.compile(loss='binary_crossentropy', optimizer=adm, metrics=['accuracy'])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_5 (Embedding)      (None, 38, 100)           3767800   
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 36, 64)            19264     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 12, 64)            0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 768)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 64)                49216     
_________________________________________________________________
dropout_7 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 65        
=================================================================
Total params: 3,836,345
Trainable params: 3,836,345
Non-trainable params: 0
_________________________________________________________________

hist = n_model2.fit(padded_docs,sentiment_array,epochs=100,batch_size=100,validation_split=0.1)

Train on 33174 samples, validate on 3687 samples
Epoch 1/100
33174/33174 [==============================] - 23s 688us/step - loss: 0.6859 - acc: 0.5677 - val_loss: 0.6800 - val_acc: 0.5837
Epoch 2/100
33174/33174 [==============================] - 22s 654us/step - loss: 0.6750 - acc: 0.5908 - val_loss: 0.6750 - val_acc: 0.5837
Epoch 3/100
33174/33174 [==============================] - 22s 660us/step - loss: 0.6706 - acc: 0.5908 - val_loss: 0.6711 - val_acc: 0.5837
Epoch 4/100
33174/33174 [==============================] - 22s 666us/step - loss: 0.6650 - acc: 0.5908 - val_loss: 0.6655 - val_acc: 0.5837
Epoch 5/100
33174/33174 [==============================] - 22s 650us/step - loss: 0.6575 - acc: 0.5908 - val_loss: 0.6567 - val_acc: 0.5837
Epoch 6/100
33174/33174 [==============================] - 22s 650us/step - loss: 0.6460 - acc: 0.5915 - val_loss: 0.6425 - val_acc: 0.5891
Epoch 7/100
33174/33174 [==============================] - 23s 680us/step - loss: 0.6284 - acc: 0.6121 - val_loss: 0.6218 - val_acc: 0.6222
Epoch 8/100
33174/33174 [==============================] - 21s 643us/step - loss: 0.6046 - acc: 0.6722 - val_loss: 0.5949 - val_acc: 0.6813
Epoch 9/100
33174/33174 [==============================] - 22s 653us/step - loss: 0.5754 - acc: 0.7240 - val_loss: 0.5653 - val_acc: 0.7266
Epoch 10/100
33174/33174 [==============================] - 22s 653us/step - loss: 0.5463 - acc: 0.7511 - val_loss: 0.5368 - val_acc: 0.7513
Epoch 11/100
33174/33174 [==============================] - 22s 668us/step - loss: 0.5190 - acc: 0.7679 - val_loss: 0.5122 - val_acc: 0.7624
Epoch 12/100
33174/33174 [==============================] - 22s 654us/step - loss: 0.4961 - acc: 0.7791 - val_loss: 0.4915 - val_acc: 0.7711
Epoch 13/100
33174/33174 [==============================] - 22s 650us/step - loss: 0.4740 - acc: 0.7915 - val_loss: 0.4739 - val_acc: 0.7792
Epoch 14/100
33174/33174 [==============================] - 23s 684us/step - loss: 0.4568 - acc: 0.7988 - val_loss: 0.4589 - val_acc: 0.7920
Epoch 15/100
33174/33174 [==============================] - 22s 662us/step - loss: 0.4407 - acc: 0.8070 - val_loss: 0.4464 - val_acc: 0.7979
Epoch 16/100
33174/33174 [==============================] - 22s 652us/step - loss: 0.4279 - acc: 0.8146 - val_loss: 0.4363 - val_acc: 0.8042
Epoch 17/100
33174/33174 [==============================] - 22s 651us/step - loss: 0.4152 - acc: 0.8216 - val_loss: 0.4278 - val_acc: 0.8082
Epoch 18/100
33174/33174 [==============================] - 22s 671us/step - loss: 0.4042 - acc: 0.8270 - val_loss: 0.4205 - val_acc: 0.8142
Epoch 19/100
33174/33174 [==============================] - 21s 645us/step - loss: 0.3949 - acc: 0.8335 - val_loss: 0.4149 - val_acc: 0.8150
Epoch 20/100
33174/33174 [==============================] - 22s 667us/step - loss: 0.3867 - acc: 0.8370 - val_loss: 0.4102 - val_acc: 0.8205
Epoch 21/100
33174/33174 [==============================] - 23s 692us/step - loss: 0.3798 - acc: 0.8412 - val_loss: 0.4065 - val_acc: 0.8183
Epoch 22/100
33174/33174 [==============================] - 23s 685us/step - loss: 0.3734 - acc: 0.8447 - val_loss: 0.4034 - val_acc: 0.8205
Epoch 23/100
33174/33174 [==============================] - 22s 677us/step - loss: 0.3662 - acc: 0.8477 - val_loss: 0.4008 - val_acc: 0.8221
Epoch 24/100
33174/33174 [==============================] - 23s 681us/step - loss: 0.3606 - acc: 0.8503 - val_loss: 0.3987 - val_acc: 0.8226
Epoch 25/100
33174/33174 [==============================] - 23s 684us/step - loss: 0.3556 - acc: 0.8527 - val_loss: 0.3968 - val_acc: 0.8223
Epoch 26/100
33174/33174 [==============================] - 23s 683us/step - loss: 0.3503 - acc: 0.8571 - val_loss: 0.3956 - val_acc: 0.8237
Epoch 27/100
33174/33174 [==============================] - 22s 663us/step - loss: 0.3447 - acc: 0.8585 - val_loss: 0.3943 - val_acc: 0.8245
Epoch 28/100
33174/33174 [==============================] - 22s 675us/step - loss: 0.3403 - acc: 0.8598 - val_loss: 0.3934 - val_acc: 0.8253
Epoch 29/100
33174/33174 [==============================] - 22s 672us/step - loss: 0.3354 - acc: 0.8632 - val_loss: 0.3929 - val_acc: 0.8270
Epoch 30/100
33174/33174 [==============================] - 22s 656us/step - loss: 0.3318 - acc: 0.8653 - val_loss: 0.3922 - val_acc: 0.8248
Epoch 31/100
33174/33174 [==============================] - 22s 654us/step - loss: 0.3279 - acc: 0.8658 - val_loss: 0.3919 - val_acc: 0.8242
Epoch 32/100
33174/33174 [==============================] - 22s 668us/step - loss: 0.3234 - acc: 0.8707 - val_loss: 0.3922 - val_acc: 0.8256
Epoch 33/100
33174/33174 [==============================] - 22s 669us/step - loss: 0.3186 - acc: 0.8708 - val_loss: 0.3918 - val_acc: 0.8248
Epoch 34/100
33174/33174 [==============================] - 22s 657us/step - loss: 0.3145 - acc: 0.8739 - val_loss: 0.3919 - val_acc: 0.8242
Epoch 35/100
33174/33174 [==============================] - 21s 647us/step - loss: 0.3104 - acc: 0.8756 - val_loss: 0.3921 - val_acc: 0.8248
Epoch 36/100
33174/33174 [==============================] - 22s 670us/step - loss: 0.3074 - acc: 0.8781 - val_loss: 0.3926 - val_acc: 0.8256
Epoch 37/100
33174/33174 [==============================] - 22s 667us/step - loss: 0.3032 - acc: 0.8789 - val_loss: 0.3929 - val_acc: 0.8256
Epoch 38/100
33174/33174 [==============================] - 22s 666us/step - loss: 0.2990 - acc: 0.8813 - val_loss: 0.3934 - val_acc: 0.8229
Epoch 39/100
33174/33174 [==============================] - 22s 661us/step - loss: 0.2967 - acc: 0.8813 - val_loss: 0.3940 - val_acc: 0.8240
Epoch 40/100
33174/33174 [==============================] - 23s 684us/step - loss: 0.2921 - acc: 0.8849 - val_loss: 0.3948 - val_acc: 0.8223
Epoch 41/100
33174/33174 [==============================] - 22s 664us/step - loss: 0.2881 - acc: 0.8864 - val_loss: 0.3957 - val_acc: 0.8226
Epoch 42/100
33174/33174 [==============================] - 23s 684us/step - loss: 0.2840 - acc: 0.8885 - val_loss: 0.3967 - val_acc: 0.8215
Epoch 43/100
33174/33174 [==============================] - 22s 670us/step - loss: 0.2811 - acc: 0.8896 - val_loss: 0.3976 - val_acc: 0.8221
Epoch 44/100
33174/33174 [==============================] - 23s 679us/step - loss: 0.2775 - acc: 0.8914 - val_loss: 0.3989 - val_acc: 0.8199
Epoch 45/100
33174/33174 [==============================] - 22s 657us/step - loss: 0.2734 - acc: 0.8927 - val_loss: 0.4007 - val_acc: 0.8186
Epoch 46/100
33174/33174 [==============================] - 22s 652us/step - loss: 0.2703 - acc: 0.8943 - val_loss: 0.4015 - val_acc: 0.8169
Epoch 47/100
33174/33174 [==============================] - 22s 663us/step - loss: 0.2668 - acc: 0.8964 - val_loss: 0.4026 - val_acc: 0.8183
Epoch 48/100
33174/33174 [==============================] - 22s 659us/step - loss: 0.2632 - acc: 0.8978 - val_loss: 0.4041 - val_acc: 0.8183
Epoch 49/100
33174/33174 [==============================] - 21s 645us/step - loss: 0.2603 - acc: 0.8988 - val_loss: 0.4057 - val_acc: 0.8172
Epoch 50/100
33174/33174 [==============================] - 21s 640us/step - loss: 0.2564 - acc: 0.9022 - val_loss: 0.4075 - val_acc: 0.8161
Epoch 51/100
33174/33174 [==============================] - 22s 674us/step - loss: 0.2532 - acc: 0.9034 - val_loss: 0.4093 - val_acc: 0.8158
Epoch 52/100
33174/33174 [==============================] - 22s 673us/step - loss: 0.2501 - acc: 0.9053 - val_loss: 0.4108 - val_acc: 0.8172
Epoch 53/100
33174/33174 [==============================] - 22s 664us/step - loss: 0.2465 - acc: 0.9067 - val_loss: 0.4134 - val_acc: 0.8150
Epoch 54/100
33174/33174 [==============================] - 22s 665us/step - loss: 0.2426 - acc: 0.9076 - val_loss: 0.4148 - val_acc: 0.8164
Epoch 55/100
33174/33174 [==============================] - 23s 683us/step - loss: 0.2399 - acc: 0.9095 - val_loss: 0.4167 - val_acc: 0.8164
Epoch 56/100
33174/33174 [==============================] - 23s 690us/step - loss: 0.2368 - acc: 0.9101 - val_loss: 0.4192 - val_acc: 0.8153
Epoch 57/100
33174/33174 [==============================] - 22s 664us/step - loss: 0.2327 - acc: 0.9132 - val_loss: 0.4212 - val_acc: 0.8153
Epoch 58/100
33174/33174 [==============================] - 22s 676us/step - loss: 0.2298 - acc: 0.9140 - val_loss: 0.4236 - val_acc: 0.8156
Epoch 59/100
33174/33174 [==============================] - 22s 658us/step - loss: 0.2268 - acc: 0.9156 - val_loss: 0.4259 - val_acc: 0.8153
Epoch 60/100
33174/33174 [==============================] - 22s 664us/step - loss: 0.2242 - acc: 0.9170 - val_loss: 0.4283 - val_acc: 0.8142
Epoch 61/100
33174/33174 [==============================] - 22s 668us/step - loss: 0.2204 - acc: 0.9185 - val_loss: 0.4307 - val_acc: 0.8145
Epoch 62/100
33174/33174 [==============================] - 23s 690us/step - loss: 0.2169 - acc: 0.9201 - val_loss: 0.4336 - val_acc: 0.8148
Epoch 63/100
33174/33174 [==============================] - 22s 678us/step - loss: 0.2136 - acc: 0.9219 - val_loss: 0.4358 - val_acc: 0.8120
Epoch 64/100
33174/33174 [==============================] - 22s 671us/step - loss: 0.2109 - acc: 0.9223 - val_loss: 0.4385 - val_acc: 0.8126
Epoch 65/100
33174/33174 [==============================] - 23s 681us/step - loss: 0.2073 - acc: 0.9254 - val_loss: 0.4413 - val_acc: 0.8126
Epoch 66/100
33174/33174 [==============================] - 22s 664us/step - loss: 0.2051 - acc: 0.9250 - val_loss: 0.4445 - val_acc: 0.8134
Epoch 67/100
33174/33174 [==============================] - 22s 652us/step - loss: 0.2010 - acc: 0.9280 - val_loss: 0.4474 - val_acc: 0.8126
Epoch 68/100
33174/33174 [==============================] - 21s 646us/step - loss: 0.1981 - acc: 0.9274 - val_loss: 0.4504 - val_acc: 0.8118
Epoch 69/100
33174/33174 [==============================] - 22s 662us/step - loss: 0.1952 - acc: 0.9291 - val_loss: 0.4537 - val_acc: 0.8112
Epoch 70/100
33174/33174 [==============================] - 22s 668us/step - loss: 0.1919 - acc: 0.9320 - val_loss: 0.4570 - val_acc: 0.8104
Epoch 71/100
33174/33174 [==============================] - 21s 635us/step - loss: 0.1896 - acc: 0.9337 - val_loss: 0.4603 - val_acc: 0.8110
Epoch 72/100
33174/33174 [==============================] - 21s 638us/step - loss: 0.1858 - acc: 0.9354 - val_loss: 0.4644 - val_acc: 0.8115
Epoch 73/100
33174/33174 [==============================] - 22s 662us/step - loss: 0.1834 - acc: 0.9362 - val_loss: 0.4682 - val_acc: 0.8101
Epoch 74/100
33174/33174 [==============================] - 21s 640us/step - loss: 0.1808 - acc: 0.9371 - val_loss: 0.4719 - val_acc: 0.8080
Epoch 75/100
33174/33174 [==============================] - 21s 636us/step - loss: 0.1780 - acc: 0.9371 - val_loss: 0.4741 - val_acc: 0.8099
Epoch 76/100
33174/33174 [==============================] - 22s 667us/step - loss: 0.1748 - acc: 0.9397 - val_loss: 0.4778 - val_acc: 0.8088
Epoch 77/100
33174/33174 [==============================] - 22s 662us/step - loss: 0.1718 - acc: 0.9406 - val_loss: 0.4817 - val_acc: 0.8074
Epoch 78/100
33174/33174 [==============================] - 21s 648us/step - loss: 0.1689 - acc: 0.9425 - val_loss: 0.4855 - val_acc: 0.8069
Epoch 79/100
33174/33174 [==============================] - 22s 653us/step - loss: 0.1665 - acc: 0.9440 - val_loss: 0.4891 - val_acc: 0.8061
Epoch 80/100
33174/33174 [==============================] - 22s 677us/step - loss: 0.1640 - acc: 0.9447 - val_loss: 0.4935 - val_acc: 0.8050
Epoch 81/100
33174/33174 [==============================] - 22s 666us/step - loss: 0.1610 - acc: 0.9461 - val_loss: 0.4976 - val_acc: 0.8034
Epoch 82/100
33174/33174 [==============================] - 22s 654us/step - loss: 0.1590 - acc: 0.9472 - val_loss: 0.5014 - val_acc: 0.8031
Epoch 83/100
33174/33174 [==============================] - 22s 665us/step - loss: 0.1558 - acc: 0.9482 - val_loss: 0.5052 - val_acc: 0.8028
Epoch 84/100
33174/33174 [==============================] - 23s 688us/step - loss: 0.1535 - acc: 0.9489 - val_loss: 0.5103 - val_acc: 0.8007
Epoch 85/100
33174/33174 [==============================] - 21s 647us/step - loss: 0.1508 - acc: 0.9511 - val_loss: 0.5143 - val_acc: 0.8009
Epoch 86/100
33174/33174 [==============================] - 22s 652us/step - loss: 0.1480 - acc: 0.9518 - val_loss: 0.5182 - val_acc: 0.8007
Epoch 87/100
33174/33174 [==============================] - 22s 672us/step - loss: 0.1459 - acc: 0.9529 - val_loss: 0.5224 - val_acc: 0.7996
Epoch 88/100
33174/33174 [==============================] - 22s 669us/step - loss: 0.1430 - acc: 0.9542 - val_loss: 0.5272 - val_acc: 0.7977
Epoch 89/100
33174/33174 [==============================] - 22s 659us/step - loss: 0.1406 - acc: 0.9550 - val_loss: 0.5316 - val_acc: 0.7979
Epoch 90/100
33174/33174 [==============================] - 22s 672us/step - loss: 0.1379 - acc: 0.9561 - val_loss: 0.5375 - val_acc: 0.7979
Epoch 91/100
33174/33174 [==============================] - 23s 680us/step - loss: 0.1356 - acc: 0.9570 - val_loss: 0.5415 - val_acc: 0.7960
Epoch 92/100
33174/33174 [==============================] - 22s 663us/step - loss: 0.1328 - acc: 0.9592 - val_loss: 0.5453 - val_acc: 0.7947
Epoch 93/100
33174/33174 [==============================] - 22s 667us/step - loss: 0.1304 - acc: 0.9597 - val_loss: 0.5511 - val_acc: 0.7947
Epoch 94/100
33174/33174 [==============================] - 22s 665us/step - loss: 0.1282 - acc: 0.9607 - val_loss: 0.5572 - val_acc: 0.7917
Epoch 95/100
33174/33174 [==============================] - 22s 673us/step - loss: 0.1261 - acc: 0.9614 - val_loss: 0.5609 - val_acc: 0.7917
Epoch 96/100
33174/33174 [==============================] - 21s 647us/step - loss: 0.1240 - acc: 0.9620 - val_loss: 0.5661 - val_acc: 0.7917
Epoch 97/100
33174/33174 [==============================] - 22s 669us/step - loss: 0.1207 - acc: 0.9638 - val_loss: 0.5713 - val_acc: 0.7909
Epoch 98/100
33174/33174 [==============================] - 23s 687us/step - loss: 0.1189 - acc: 0.9644 - val_loss: 0.5763 - val_acc: 0.7890
Epoch 99/100
33174/33174 [==============================] - 22s 659us/step - loss: 0.1165 - acc: 0.9647 - val_loss: 0.5817 - val_acc: 0.7890
Epoch 100/100
33174/33174 [==============================] - 22s 661us/step - loss: 0.1144 - acc: 0.9665 - val_loss: 0.5870 - val_acc: 0.7879

plt.plot(hist.history["acc"],label="acc")
plt.plot(hist.history["val_acc"],label="val")
plt.legend()

model_conv = Sequential()
model_conv.add(Embedding(vocab_size, 100, input_length=38))
model_conv.add(Dropout(0.2))
model_conv.add(Conv1D(64, 5, activation='relu'))
model_conv.add(MaxPool1D(pool_size=4))
model_conv.add(LSTM(100))
model_conv.add(Dense(1, activation='sigmoid'))
adm = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model_conv.compile(loss='binary_crossentropy', optimizer=adm,    metrics=['accuracy'])
model_conv.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_7 (Embedding)      (None, 38, 100)           3767800   
_________________________________________________________________
dropout_9 (Dropout)          (None, 38, 100)           0         
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 34, 64)            32064     
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 8, 64)             0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 100)               66000     
_________________________________________________________________
dense_10 (Dense)             (None, 1)                 101       
=================================================================
Total params: 3,865,965
Trainable params: 3,865,965
Non-trainable params: 0
_________________________________________________________________

model_conv.fit(padded_docs,sentiment_array, validation_split=0.2, epochs = 10)

Train on 29488 samples, validate on 7373 samples
Epoch 1/10
29488/29488 [==============================] - 73s 2ms/step - loss: 0.5655 - acc: 0.6930 - val_loss: 0.4516 - val_acc: 0.7951
Epoch 2/10
29488/29488 [==============================] - 71s 2ms/step - loss: 0.4169 - acc: 0.8122 - val_loss: 0.4189 - val_acc: 0.8088
Epoch 3/10
29488/29488 [==============================] - 72s 2ms/step - loss: 0.3694 - acc: 0.8420 - val_loss: 0.4149 - val_acc: 0.8150
Epoch 4/10
29488/29488 [==============================] - 70s 2ms/step - loss: 0.3337 - acc: 0.8600 - val_loss: 0.4201 - val_acc: 0.8107
Epoch 5/10
29488/29488 [==============================] - 71s 2ms/step - loss: 0.3034 - acc: 0.8761 - val_loss: 0.4287 - val_acc: 0.8085
Epoch 6/10
29488/29488 [==============================] - 71s 2ms/step - loss: 0.2734 - acc: 0.8903 - val_loss: 0.4478 - val_acc: 0.8056
Epoch 7/10
29488/29488 [==============================] - 70s 2ms/step - loss: 0.2396 - acc: 0.9064 - val_loss: 0.4779 - val_acc: 0.8016
Epoch 8/10
29488/29488 [==============================] - 69s 2ms/step - loss: 0.2024 - acc: 0.9228 - val_loss: 0.5058 - val_acc: 0.7978
Epoch 9/10
29488/29488 [==============================] - 69s 2ms/step - loss: 0.1579 - acc: 0.9423 - val_loss: 0.5606 - val_acc: 0.7922
Epoch 10/10
29488/29488 [==============================] - 71s 2ms/step - loss: 0.1177 - acc: 0.9579 - val_loss: 0.6562 - val_acc: 0.7854

lower_reviews=new_airline_data.content.str.lower()

features=['','security','check-in','facilities','people','passport','arrival','waiting','access']
x=np.array([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0])

new_airline_data['lower_reviews']=lower_reviews

avg_pos=[]
avg_neg=[]
avg_net=[]
for feature in features:
  temp=new_airline_data[new_airline_data.lower_reviews.str.contains(feature)]
  avg_pos.append(len(temp[temp.sentiment==1]))
  avg_net.append(len(temp[temp.sentiment==0]))
  avg_neg.append(len(temp[temp.sentiment==-1]))

w=0.2
fig,ax=plt.subplots()
plt.bar(x-0.2,avg_pos,color='green',width=(0.2),label='avg_pos')
plt.bar(x+0.1,avg_neg,color='red',width=(0.2),label='avg_neg')
plt.bar(x+0.4,avg_net,color='black',width=0.2,label='avg_neut')
plt.xticks(rotation=45)
ax.set_xticklabels(features)
plt.legend(loc='best')

import pandas as pd

#Supress Warnings
import warnings
warnings.filterwarnings('ignore')

airport_data=pd.read_csv('airport.csv')
print(airport_data.shape)
airport_data.head(1)

(17721, 20)

airport_data.describe()

airport_data.info()


RangeIndex: 17721 entries, 0 to 17720
Data columns (total 20 columns):
airport_name                   17721 non-null object
link                           17721 non-null object
title                          17721 non-null object
author                         17721 non-null object
author_country                 12777 non-null object
date                           17721 non-null object
content                        17721 non-null object
experience_airport             647 non-null object
date_visit                     593 non-null object
type_traveller                 646 non-null object
overall_rating                 13796 non-null float64
queuing_rating                 12813 non-null float64
terminal_cleanliness_rating    12815 non-null float64
terminal_seating_rating        587 non-null float64
terminal_signs_rating          27 non-null float64
food_beverages_rating          630 non-null float64
airport_shopping_rating        12676 non-null float64
wifi_connectivity_rating       412 non-null float64
airport_staff_rating           26 non-null float64
recommended                    17721 non-null int64
dtypes: float64(9), int64(1), object(10)
memory usage: 2.7+ MB

#Percentage Null Values
round((airport_data.isna().sum()/len(airport_data)*100),2)

airport_name                    0.00
link                            0.00
title                           0.00
author                          0.00
author_country                 27.90
date                            0.00
content                         0.00
experience_airport             96.35
date_visit                     96.65
type_traveller                 96.35
overall_rating                 22.15
queuing_rating                 27.70
terminal_cleanliness_rating    27.68
terminal_seating_rating        96.69
terminal_signs_rating          99.85
food_beverages_rating          96.44
airport_shopping_rating        28.47
wifi_connectivity_rating       97.68
airport_staff_rating           99.85
recommended                     0.00
dtype: float64

filtered_airport_data=airport_data.drop(['experience_airport','date_visit','type_traveller','terminal_seating_rating','terminal_signs_rating','food_beverages_rating','wifi_connectivity_rating','airport_staff_rating'],axis=1)

round((filtered_airport_data.isna().sum()/len(filtered_airport_data)*100),2)

airport_name                    0.00
link                            0.00
title                           0.00
author                          0.00
author_country                 27.90
date                            0.00
content                         0.00
overall_rating                 22.15
queuing_rating                 27.70
terminal_cleanliness_rating    27.68
airport_shopping_rating        28.47
recommended                     0.00
dtype: float64

airport_name=filtered_airport_data.airport_name.unique()
print("Total Aiports Conisiderd for Analysis:",len(airport_name))

Total Aiports Conisiderd for Analysis: 741

# Recommendation Count
recommendation_count=[]
not_recommendation_count=[]
total_reviews=[]
count=0
for i in airport_name:
    temp_df=filtered_airport_data[filtered_airport_data['airport_name']==i]
    rec=(temp_df['recommended']==1).sum()
    total_reviews.append(len(temp_df))
    recommendation_count.append(rec)
    not_recommendation_count.append(len(temp_df)-rec)

result=zip(airport_name,total_reviews,recommendation_count,not_recommendation_count)

recommendation_df=pd.DataFrame(result,columns=['airport_name','total_reviews','recommendation_count','not_recommended'])

recommendation_df.sort_values(by='total_reviews',ascending=False)

import numpy as np
import matplotlib.pyplot as plt
from textblob import TextBlob

result=[]
num=1
numlist=[]
for review in filtered_airport_data.content:
    analysis=TextBlob(review)
    result.append(analysis.polarity)
    numlist.append(num)
    num=num+1
result=np.array(result)
result

array([0.24735294, 0.09988529, 0.33133333, ..., 0.15833333, 0.48240741,
       0.19375   ])

plt.figure(figsize=(16,12))
plt.style.use('seaborn')
plt.scatter(numlist,result,label='Polarity')
plt.xlabel('Reviews')
plt.ylabel('Polarity')
plt.legend()

average_polarity=result.sum()/len(result)
print('Averge Plority : {}'.format(str(average_polarity)))

Averge Plority : 0.08175742657371485

plt.style.use('seaborn')
plt.figure(figsize=(16,12))
plt.plot(numlist,result,label='Polarity')
plt.xlabel('Reviews')
plt.ylabel('Polarity')
plt.legend()

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv("bestsellers with categories.csv")
data.head()

r,c = data.shape
print(f"The dataset has {r} rows and {c} columns.")

The dataset has 550 rows and 7 columns.

data.dtypes

Name            object
Author          object
User Rating    float64
Reviews          int64
Price            int64
Year             int64
Genre           object
dtype: object

data.info()


RangeIndex: 550 entries, 0 to 549
Data columns (total 7 columns):
Name           550 non-null object
Author         550 non-null object
User Rating    550 non-null float64
Reviews        550 non-null int64
Price          550 non-null int64
Year           550 non-null int64
Genre          550 non-null object
dtypes: float64(1), int64(3), object(3)
memory usage: 30.2+ KB

data.nunique()

Name           351
Author         248
User Rating     14
Reviews        346
Price           40
Year            11
Genre            2
dtype: int64

data.describe()

data.isnull().sum()

Name           0
Author         0
User Rating    0
Reviews        0
Price          0
Year           0
Genre          0
dtype: int64

data.isnull().sum(axis=1).sort_values(ascending=False)

549    0
180    0
186    0
185    0
184    0
      ..
372    0
373    0
374    0
375    0
0      0
Length: 550, dtype: int64

data[data.duplicated()]

len(data[data.duplicated()])

0

len(data.Name.unique())

351

data[['Name','Author']].duplicated().sum()

199

data['Profit'] = data['Reviews'] * data['Price']
data

Top10 = data.sort_values(by='Profit',ascending=False).head(10)
Top10

Top10 = data.groupby('Name')[['Profit']].max().sort_values(by='Profit',ascending=False).head(10)
Top10

sns.barplot(x=Top10.Profit,y=Top10.index.values)
plt.xlabel("Earned(millions)")
plt.ylabel("Books")
plt.title('Earning by Books')

Text(0.5, 1.0, 'Earning by Books')

data.Genre.unique()

array(['Non Fiction', 'Fiction'], dtype=object)

data.Genre.value_counts()

Non Fiction    310
Fiction        240
Name: Genre, dtype: int64

sns.countplot(x="Genre", data=data)
plt.show()

plt.pie(data.Genre.value_counts(),labels=['Non Fiction','Fiction'],autopct='%.0f%%');

data.groupby('Genre')['User Rating'].mean()

Genre
Fiction        4.648333
Non Fiction    4.595161
Name: User Rating, dtype: float64

sns.histplot(data=data['User Rating'],bins=10)
plt.xlabel("Ratings")

Text(0.5, 0, 'Ratings')

sns.lineplot(y=data['User Rating'],x=data['Year'],hue=data['Genre']);
plt.ylabel("Ratings")
plt.xlabel("Years");

sns.lmplot(y='User Rating',x='Price',data=data)
plt.ylabel('Ratings')
plt.xlabel('Price');

data.groupby('Genre')['User Rating'].mean()

Genre
Fiction        4.648333
Non Fiction    4.595161
Name: User Rating, dtype: float64

data.groupby('Year').Profit.sum().sort_values(ascending=False)

Year
2014    10625500
2012     8929419
2019     8336955
2013     8321579
2016     7951530
2015     7745165
2018     7183575
2017     6669195
2011     5548689
2010     3620509
2009     3567282
Name: Profit, dtype: int64

data.groupby('Year').Profit.sum().plot(kind='bar')
plt.show()

data.groupby(['Year','Genre']).Profit.sum().sort_values(ascending=False)

Year  Genre      
2014  Fiction        6858148
2013  Fiction        5444489
2012  Fiction        5098394
2018  Non Fiction    4713219
2019  Non Fiction    4500705
2015  Fiction        4364798
2016  Fiction        4064705
      Non Fiction    3886825
2019  Fiction        3836250
2012  Non Fiction    3831025
2014  Non Fiction    3767352
2017  Non Fiction    3568679
2015  Non Fiction    3380367
2017  Fiction        3100516
2011  Non Fiction    3082561
2013  Non Fiction    2877090
2018  Fiction        2470356
2011  Fiction        2466128
2009  Fiction        2058643
2010  Non Fiction    1930069
      Fiction        1690440
2009  Non Fiction    1508639
Name: Profit, dtype: int64

sns.barplot(x=data['Year'],y=data['Profit'],hue=data['Genre'])
plt.xlabel('Years')
plt.ylabel("Eearned(millions)")
plt.title('Money earned each year');

genre_average=data.groupby(['Genre'])['Profit'].mean()
genre_average

Genre
Fiction        172720.279167
Non Fiction    119504.938710
Name: Profit, dtype: float64

sns.barplot(x=genre_average.index,y=genre_average);

data.groupby('Year')['Profit'].max()

Year
2009     394680
2010     474768
2011     508470
2012     661710
2013     701295
2014    1396161
2015    1430028
2016     700492
2017     458730
2018     672463
2019    1317615
Name: Profit, dtype: int64

data.groupby('Year')['Profit'].transform(max)

0       700492
1       508470
2       672463
3       458730
4      1317615
        ...   
545    1317615
546     700492
547     458730
548     672463
549    1317615
Name: Profit, Length: 550, dtype: int64

data[data.groupby('Year')['Profit'].transform(max) == data['Profit']]

most_earning_book_per_year=data[data.groupby('Year')['Profit'].transform(max) == data['Profit']]
most_earning_book_per_year

most_earning_book_per_year=most_earning_book_per_year.sort_values('Year').set_index('Year')
most_earning_book_per_year

genres_per_year_mean=data.groupby(['Year','Genre'])[['Profit']].mean().round(2)
genres_per_year_mean

Earning_Graph=data.groupby('Year')['Profit'].sum()
Earning_Graph

Year
2009     3567282
2010     3620509
2011     5548689
2012     8929419
2013     8321579
2014    10625500
2015     7745165
2016     7951530
2017     6669195
2018     7183575
2019     8336955
Name: Profit, dtype: int64

sns.lineplot(data=Earning_Graph)
plt.xlabel('Year')
plt.ylabel("Earned")
plt.title("EARNING PER YEAR")
plt.figure(figsize=(12,12));

authors=data.groupby('Author')['Profit'].sum().sort_values(ascending=False).head(10)
authors

Author
American Psychological Association    3946800
Suzanne Collins                       3368646
E L James                             2517303
John Green                            2381609
Laura Hillenbrand                     2284821
Paula Hawkins                         1986150
Gillian Flynn                         1660859
Gary Chapman                          1516167
Dr. Seuss                             1423598
American Psychiatric Association      1402590
Name: Profit, dtype: int64

sns.barplot(y=authors.index,x=authors)
plt.title('The Money Makers ')

Text(0.5, 1.0, 'The Money Makers ')

!pip install xgboost

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
cancer

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
         1.189e-01],
        [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
         8.902e-02],
        [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
         8.758e-02],
        ...,
        [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
         7.820e-02],
        [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
         1.240e-01],
        [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
         7.039e-02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
        1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
        1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
        1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
        0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
        1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
        0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
        1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
        1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
        0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
        1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
        1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
        1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
        1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
        1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1]),
 'target_names': array(['malignant', 'benign'], dtype='

cancer['feature_names']

array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error',
       'fractal dimension error', 'worst radius', 'worst texture',
       'worst perimeter', 'worst area', 'worst smoothness',
       'worst compactness', 'worst concavity', 'worst concave points',
       'worst symmetry', 'worst fractal dimension'], dtype='

cancer['target_names']

array(['malignant', 'benign'], dtype='

X = cancer.data
y = cancer.target

xgb_model = xgb.XGBClassifier(objective="binary:logistic", random_state=42)
xgb_model.fit(X,y)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=42, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

y_pred = xgb_model.predict(X)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
       1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y, y_pred)
cm

array([[212,   0],
       [  0, 357]], dtype=int64)

from sklearn.metrics import accuracy_score
accuracy_score(y,y_pred) * 100

100.0

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('datasets/iris.csv')
df

df = df.iloc[:,1:]
df

X = df.iloc[:,:4].values
y = df.iloc[:,4].values

from sklearn.preprocessing import LabelEncoder
Ly = LabelEncoder()
y = Ly.fit_transform(y)
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Ly.classes_

array(['setosa', 'versicolor', 'virginica'], dtype=object)

Ly.inverse_transform([0])

array(['setosa'], dtype=object)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=0)
X_train

array([[6.4, 3.1, 5.5, 1.8],
       [5.4, 3. , 4.5, 1.5],
       [5.2, 3.5, 1.5, 0.2],
       [6.1, 3. , 4.9, 1.8],
       [6.4, 2.8, 5.6, 2.2],
       [5.2, 2.7, 3.9, 1.4],
       [5.7, 3.8, 1.7, 0.3],
       [6. , 2.7, 5.1, 1.6],
       [5.9, 3. , 4.2, 1.5],
       [5.8, 2.6, 4. , 1.2],
       [6.8, 3. , 5.5, 2.1],
       [4.7, 3.2, 1.3, 0.2],
       [6.9, 3.1, 5.1, 2.3],
       [5. , 3.5, 1.6, 0.6],
       [5.4, 3.7, 1.5, 0.2],
       [5. , 2. , 3.5, 1. ],
       [6.5, 3. , 5.5, 1.8],
       [6.7, 3.3, 5.7, 2.5],
       [6. , 2.2, 5. , 1.5],
       [6.7, 2.5, 5.8, 1.8],
       [5.6, 2.5, 3.9, 1.1],
       [7.7, 3. , 6.1, 2.3],
       [6.3, 3.3, 4.7, 1.6],
       [5.5, 2.4, 3.8, 1.1],
       [6.3, 2.7, 4.9, 1.8],
       [6.3, 2.8, 5.1, 1.5],
       [4.9, 2.5, 4.5, 1.7],
       [6.3, 2.5, 5. , 1.9],
       [7. , 3.2, 4.7, 1.4],
       [6.5, 3. , 5.2, 2. ],
       [6. , 3.4, 4.5, 1.6],
       [4.8, 3.1, 1.6, 0.2],
       [5.8, 2.7, 5.1, 1.9],
       [5.6, 2.7, 4.2, 1.3],
       [5.6, 2.9, 3.6, 1.3],
       [5.5, 2.5, 4. , 1.3],
       [6.1, 3. , 4.6, 1.4],
       [7.2, 3.2, 6. , 1.8],
       [5.3, 3.7, 1.5, 0.2],
       [4.3, 3. , 1.1, 0.1],
       [6.4, 2.7, 5.3, 1.9],
       [5.7, 3. , 4.2, 1.2],
       [5.4, 3.4, 1.7, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [6.9, 3.1, 4.9, 1.5],
       [4.6, 3.1, 1.5, 0.2],
       [5.9, 3. , 5.1, 1.8],
       [5.1, 2.5, 3. , 1.1],
       [4.6, 3.4, 1.4, 0.3],
       [6.2, 2.2, 4.5, 1.5],
       [7.2, 3.6, 6.1, 2.5],
       [5.7, 2.9, 4.2, 1.3],
       [4.8, 3. , 1.4, 0.1],
       [7.1, 3. , 5.9, 2.1],
       [6.9, 3.2, 5.7, 2.3],
       [6.5, 3. , 5.8, 2.2],
       [6.4, 2.8, 5.6, 2.1],
       [5.1, 3.8, 1.6, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [6.5, 3.2, 5.1, 2. ],
       [6.7, 3.3, 5.7, 2.1],
       [4.5, 2.3, 1.3, 0.3],
       [6.2, 3.4, 5.4, 2.3],
       [4.9, 3. , 1.4, 0.2],
       [5.7, 2.5, 5. , 2. ],
       [6.9, 3.1, 5.4, 2.1],
       [4.4, 3.2, 1.3, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [7.2, 3. , 5.8, 1.6],
       [5.1, 3.5, 1.4, 0.3],
       [4.4, 3. , 1.3, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [5.5, 2.3, 4. , 1.3],
       [6.8, 3.2, 5.9, 2.3],
       [7.6, 3. , 6.6, 2.1],
       [5.1, 3.5, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [5.7, 2.8, 4.5, 1.3],
       [6.6, 3. , 4.4, 1.4],
       [5. , 3.2, 1.2, 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [6.4, 2.9, 4.3, 1.3],
       [5.4, 3.4, 1.5, 0.4],
       [7.7, 2.6, 6.9, 2.3],
       [4.9, 2.4, 3.3, 1. ],
       [7.9, 3.8, 6.4, 2. ],
       [6.7, 3.1, 4.4, 1.4],
       [5.2, 4.1, 1.5, 0.1],
       [6. , 3. , 4.8, 1.8],
       [5.8, 4. , 1.2, 0.2],
       [7.7, 2.8, 6.7, 2. ],
       [5.1, 3.8, 1.5, 0.3],
       [4.7, 3.2, 1.6, 0.2],
       [7.4, 2.8, 6.1, 1.9],
       [5. , 3.3, 1.4, 0.2],
       [6.3, 3.4, 5.6, 2.4],
       [5.7, 2.8, 4.1, 1.3],
       [5.8, 2.7, 3.9, 1.2],
       [5.7, 2.6, 3.5, 1. ],
       [6.4, 3.2, 5.3, 2.3],
       [6.7, 3. , 5.2, 2.3],
       [6.3, 2.5, 4.9, 1.5],
       [6.7, 3. , 5. , 1.7],
       [5. , 3. , 1.6, 0.2],
       [5.5, 2.4, 3.7, 1. ],
       [6.7, 3.1, 5.6, 2.4],
       [5.8, 2.7, 5.1, 1.9],
       [5.1, 3.4, 1.5, 0.2],
       [6.6, 2.9, 4.6, 1.3],
       [5.6, 3. , 4.1, 1.3],
       [5.9, 3.2, 4.8, 1.8],
       [6.3, 2.3, 4.4, 1.3],
       [5.5, 3.5, 1.3, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.9, 3.1, 1.5, 0.1],
       [6.3, 2.9, 5.6, 1.8],
       [5.8, 2.7, 4.1, 1. ],
       [7.7, 3.8, 6.7, 2.2],
       [4.6, 3.2, 1.4, 0.2]])

import xgboost as xgb
xgb_model = xgb.XGBClassifier(random_state=0)
xgb_model.fit(X_train,y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='multi:softprob', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

Yp = xgb_model.predict(X_test)
Yp

array([2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1,
       0, 0, 2, 0, 0, 1, 1, 0])

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, Yp)
cm

array([[11,  0,  0],
       [ 0, 13,  0],
       [ 0,  0,  6]], dtype=int64)

from sklearn.metrics import accuracy_score
accuracy_score(y_test,Yp) * 100

100.0

# Import the numpy and pandas package
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Read the given CSV file, and view some sample records

advertising = pd.read_csv("datasets/ads.csv")
advertising.head()

advertising.shape

(200, 4)

advertising.info()


RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
TV           200 non-null float64
Radio        200 non-null float64
Newspaper    200 non-null float64
Sales        200 non-null float64
dtypes: float64(4)
memory usage: 6.4 KB

advertising.isnull().sum()

TV           0
Radio        0
Newspaper    0
Sales        0
dtype: int64

advertising.describe()

sns.regplot(x='TV',y='Sales',data=advertising)
plt.show()

sns.regplot(x='Radio',y='Sales',data=advertising)
plt.show()

sns.regplot(x='Newspaper',y='Sales',data=advertising)
plt.show()

sns.pairplot(advertising, x_vars=['TV', 'Newspaper', 'Radio'], y_vars='Sales',size=4, aspect=1, kind='scatter')
plt.show()

c:\users\karan\appdata\local\programs\python\python36\lib\site-packages\seaborn\axisgrid.py:2065: UserWarning: The `size` parameter has been renamed to `height`; pleaes update your code.
  warnings.warn(msg, UserWarning)

advertising.corr()

sns.heatmap(advertising.corr(), cmap="YlGnBu", annot = True)
plt.show()

X = advertising['TV'].values.reshape(-1,1)
y = advertising['Sales'].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.80, test_size = 0.20, random_state = 42)

X_train

array([[116. ],
       [177. ],
       [ 43.1],
       [ 62.3],
       [224. ],
       [ 38.2],
       [ 70.6],
       [147.3],
       [104.6],
       [ 76.3],
       [ 78.2],
       [168.4],
       [  8.7],
       [  7.8],
       [ 76.4],
       [129.4],
       [ 73.4],
       [289.7],
       [ 19.6],
       [197.6],
       [284.3],
       [184.9],
       [112.9],
       [ 23.8],
       [290.7],
       [ 19.4],
       [293.6],
       [ 18.7],
       [134.3],
       [ 25.6],
       [100.4],
       [ 80.2],
       [188.4],
       [177. ],
       [125.7],
       [209.6],
       [142.9],
       [184.9],
       [222.4],
       [241.7],
       [ 17.2],
       [120.5],
       [ 89.7],
       [191.1],
       [ 75.5],
       [193.2],
       [ 85.7],
       [266.9],
       [ 39.5],
       [261.3],
       [ 13.2],
       [193.7],
       [296.4],
       [265.6],
       [214.7],
       [149.7],
       [131.7],
       [ 57.5],
       [240.1],
       [141.3],
       [180.8],
       [ 97.2],
       [220.5],
       [140.3],
       [255.4],
       [ 96.2],
       [ 66.1],
       [239.3],
       [175.7],
       [240.1],
       [ 17.9],
       [230.1],
       [283.6],
       [171.3],
       [199.1],
       [123.1],
       [131.1],
       [ 25.1],
       [163.5],
       [248.8],
       [202.5],
       [ 13.1],
       [  4.1],
       [ 93.9],
       [262.9],
       [228.3],
       [253.8],
       [243.2],
       [239.8],
       [228. ],
       [215.4],
       [239.9],
       [107.4],
       [187.8],
       [206.9],
       [ 43. ],
       [151.5],
       [137.9],
       [182.6],
       [219.8],
       [156.6],
       [276.7],
       [205. ],
       [ 66.9],
       [ 76.4],
       [ 95.7],
       [120.2],
       [225.8],
       [ 28.6],
       [ 68.4],
       [248.4],
       [218.5],
       [109.8],
       [  8.6],
       [ 97.5],
       [210.7],
       [164.5],
       [265.2],
       [281.4],
       [ 26.8],
       [276.9],
       [ 36.9],
       [206.8],
       [287.6],
       [102.7],
       [262.7],
       [ 90.4],
       [199.8],
       [ 94.2],
       [210.8],
       [227.2],
       [ 88.3],
       [237.4],
       [136.2],
       [172.5],
       [ 17.2],
       [ 59.6],
       [ 74.7],
       [149.8],
       [166.8],
       [ 44.5],
       [216.4],
       [ 44.7],
       [  0.7],
       [121. ],
       [187.9],
       [135.2],
       [139.2],
       [110.7],
       [213.4],
       [ 18.8],
       [232.1],
       [218.4],
       [286. ],
       [109.8],
       [ 25. ],
       [204.1],
       [217.7],
       [165.6],
       [280.2]])

y_train

array([11. , 14.8, 10.1,  9.7, 16.6,  7.6, 10.5, 14.6, 10.4, 12. , 14.6,
       16.7,  7.2,  6.6,  9.4, 11. , 10.9, 25.4,  7.6, 16.7, 20. , 20.5,
       11.9,  9.2, 17.8,  6.6, 20.7,  6.7, 14. ,  9.5, 10.7, 11.9, 19.9,
       17.1, 15.9, 20.9, 15. , 20.7, 16.7, 21.8, 12. , 14.2, 10.6, 17.3,
       11.9, 20.2, 13.3, 25.4, 10.8, 24.2,  5.6, 19.2, 23.8, 17.4, 17.4,
       17.3, 12.9, 11.8, 20.9, 15.5, 17.9, 13.2, 20.1, 10.3, 19.8, 12.3,
       12.6, 20.7, 17.1, 18.2,  8. , 22.1, 25.5, 16. , 18.3, 15.2, 16. ,
        8.5, 18. , 18.9, 16.6,  5.3,  3.2, 15.3, 17. , 20.5, 17.6, 25.4,
       17.3, 21.5, 17.1, 23.2, 11.5, 20.6, 17.9,  9.6, 16.5, 15. , 21.2,
       19.6, 15.5, 16.8, 22.6,  9.7, 11.8, 11.9, 13.2, 18.4,  7.3, 13.6,
       20.2, 17.2, 16.7,  4.8, 13.7, 18.4, 17.5, 17.7, 24.4,  8.8, 27. ,
       10.8, 17.2, 26.2, 14. , 20.2, 12. , 16.4, 14. , 23.8, 19.8, 12.9,
       17.5, 13.2, 16.4,  5.9,  9.7, 14.7, 10.1, 19.6, 10.4, 22.6, 10.1,
        1.6, 11.6, 19.7, 17.2, 12.2, 16. , 17. ,  7. , 18.4, 18. , 20.9,
       12.4,  7.2, 19. , 19.4, 17.6, 19.8])

import xgboost as xgb
xgb_model = xgb.XGBRegressor(objective="reg:squarederror")
xgb_model.fit(X_train,y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=0, num_parallel_tree=1,
             objective='reg:squarederror', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

Yp= xgb_model.predict(X_test)
Yp

array([17.880438 , 19.160408 , 20.711021 ,  5.403233 , 19.767073 ,
       11.973633 , 22.146915 , 10.172751 , 16.855145 , 16.913172 ,
        7.488937 , 11.585327 , 18.448017 ,  3.2186866, 12.286012 ,
       16.190283 ,  6.4905276, 17.020899 , 11.973633 , 17.717775 ,
       21.8418   , 13.286675 ,  7.768169 , 18.918385 , 13.286675 ,
       11.585327 , 17.239777 , 12.286012 , 12.900588 ,  4.96285  ,
       16.671322 , 13.286675 , 18.017462 ,  8.869742 , 19.99905  ,
       17.717775 , 10.172751 , 17.09226  , 11.441164 ,  8.816983 ],
      dtype=float32)

xgb_model.score(X_train,y_train) * 100

98.97085078718804

xgb_model.score(X_test,y_test) * 100

81.48673425766624

	overall_rating	queuing_rating	terminal_cleanliness_rating	terminal_seating_rating	terminal_signs_rating	food_beverages_rating	airport_shopping_rating	wifi_connectivity_rating	airport_staff_rating	recommended
count	13796.000000	12813.000000	12815.000000	587.000000	27.000000	630.000000	12676.000000	412.000000	26.000000	17721.000000
mean	4.274355	2.747912	3.442450	2.580920	2.592593	2.169841	2.821631	2.405340	2.038462	0.221206
std	2.722765	1.572520	1.337508	1.403862	1.393923	1.534358	1.410575	1.579452	1.248384	0.415071
min	1.000000	0.000000	0.000000	0.000000	1.000000	0.000000	0.000000	0.000000	1.000000	0.000000
25%	2.000000	1.000000	3.000000	1.000000	1.000000	1.000000	2.000000	1.000000	1.000000	0.000000
50%	4.000000	3.000000	3.000000	2.000000	3.000000	2.000000	3.000000	2.000000	1.500000	0.000000
75%	6.000000	4.000000	5.000000	4.000000	4.000000	3.000000	4.000000	4.000000	3.000000	0.000000
max	10.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	5.000000	4.000000	1.000000

	User Rating	Reviews	Price	Year
count	550.000000	550.000000	550.000000	550.000000
mean	4.618364	11953.281818	13.100000	2014.000000
std	0.226980	11731.132017	10.842262	3.165156
min	3.300000	37.000000	0.000000	2009.000000
25%	4.500000	4058.000000	7.000000	2011.000000
50%	4.700000	8580.000000	11.000000	2014.000000
75%	4.800000	17253.250000	16.000000	2017.000000
max	4.900000	87841.000000	105.000000	2019.000000

	Name	Author	User Rating	Reviews	Price	Year	Genre	Profit
0	10-Day Green Smoothie Cleanse	JJ Smith	4.7	17350	8	2016	Non Fiction	138800
1	11/22/63: A Novel	Stephen King	4.6	2052	22	2011	Fiction	45144
2	12 Rules for Life: An Antidote to Chaos	Jordan B. Peterson	4.7	18979	15	2018	Non Fiction	284685
3	1984 (Signet Classics)	George Orwell	4.7	21424	6	2017	Fiction	128544
4	5,000 Awesome Facts (About Everything!) (Natio…	National Geographic Kids	4.8	7665	12	2019	Non Fiction	91980
…	…	…	…	…	…	…	…	…
545	Wrecking Ball (Diary of a Wimpy Kid Book 14)	Jeff Kinney	4.9	9413	8	2019	Fiction	75304
546	You Are a Badass: How to Stop Doubting Your Gr…	Jen Sincero	4.7	14331	8	2016	Non Fiction	114648
547	You Are a Badass: How to Stop Doubting Your Gr…	Jen Sincero	4.7	14331	8	2017	Non Fiction	114648
548	You Are a Badass: How to Stop Doubting Your Gr…	Jen Sincero	4.7	14331	8	2018	Non Fiction	114648
549	You Are a Badass: How to Stop Doubting Your Gr…	Jen Sincero	4.7	14331	8	2019	Non Fiction	114648

	Profit
Name
The Girl on the Train	1430028
The Alchemist	1396161
Where the Crawdads Sing	1317615
Diagnostic and Statistical Manual of Mental Disorders, 5th Edition: DSM-5	701295
Harry Potter Paperback Box Set (Books 1-7)	700492
The Goldfinch: A Novel (Pulitzer Prize for Fiction)	676880
Becoming	672463
Fifty Shades of Grey: Book One of the Fifty Shades Trilogy (Fifty Shades of Grey Series)	661710
The Fault in Our Stars	656266
A Game of Thrones / A Clash of Kings / A Storm of Swords / A Feast of Crows / A Dance with Dragons	592050

	TV	Radio	Newspaper	Sales
count	200.000000	200.000000	200.000000	200.000000
mean	147.042500	23.264000	30.554000	15.130500
std	85.854236	14.846809	21.778621	5.283892
min	0.700000	0.000000	0.300000	1.600000
25%	74.375000	9.975000	12.750000	11.000000
50%	149.750000	22.900000	25.750000	16.000000
75%	218.825000	36.525000	45.100000	19.050000
max	296.400000	49.600000	114.000000	27.000000

	Airline_Name	Total_Reviews
295	spirit-airlines	966
97	british-airways	896
333	united-airlines	839
20	air-canada-rouge	715
138	emirates	690

	airline_name	average_rating	pos_sentiment	net_sentiment	neg_sentiment
97	british-airways	5.881696	703	2	191
138	emirates	6.246377	558	1	131
295	spirit-airlines	2.902692	556	15	395
333	united-airlines	3.356377	523	5	311
215	lufthansa	6.993333	511	1	88

	airport_name	total_reviews	recommendation_count	not_recommended
410	london-heathrow-airport	520	160	360
411	london-stansted-airport	402	41	361
437	manchester-airport	303	65	238
519	paris-cdg-airport	301	48	253
210	dubai-airport	279	63	216
420	luton-airport	275	28	247
409	london-gatwick-airport	252	89	163
60	bangkok-suvarnabhumi-airport	220	69	151
242	frankfurt-main-airport	218	43	175
414	los-angeles-lax-airport	199	34	165
455	miami-airport	191	21	170
488	new-york-jfk-airport	185	43	142
624	singapore-changi-airport	181	101	80
392	leeds-bradford-airport	166	26	140
675	toronto-pearson-airport	166	41	125
31	amsterdam-schiphol-airport	166	77	89
304	hong-kong-airport	162	78	84
366	klia-kuala-lumpur-airport	160	64	96
581	rome-fiumicino-airport	155	25	130
116	bristol-airport	154	32	122
450	melbourne-airport	149	26	123
324	istanbul-ataturk-airport	149	33	116
472	mumbai-airport	149	42	107
440	manila-ninoy-aquino-airport	138	25	113
308	houston-george-bush-intercontinental-airport	138	18	120
652	sydney-airport	132	41	91
195	delhi-airport	130	63	67
91	birmingham-airport	129	37	92
489	newark-airport	129	13	116
405	liverpool-airport	128	30	98
…	…	…	…	…
232	fairbanks-airport	1	0	1
536	pointe-a-pitre-airport	1	0	1
321	irkutsk-airport	1	0	1
145	changzhou-airport	1	0	1
1	aarhus-airport	1	0	1
550	prince-rupert-airport	1	0	1
50	baghdad-airport	1	0	1
49	bagdogra-airport	1	0	1
48	bacau-airport	1	0	1
666	thandwe-airport	1	0	1
47	ayers-rock-airport	1	0	1
523	pensacola-airport	1	0	1
541	port-harcourt-airport	1	0	1
648	sukhothai-airport	1	0	1
377	kuantan-airport	1	0	1
542	port-macquarie-airport	1	0	1
645	strasbourg-airport	1	0	1
66	barnaul-airport	1	0	1
279	hakodate-airport	1	0	1
642	stockholm-bromma-airport	1	0	1
67	barra-eoligarry-airport	1	0	1
544	port-vila-bauerfield-airport	1	0	1
69	bay-of-islands-kerikeri-airport	1	0	1
637	st-pete-clearwater-international-airport	1	0	1
278	hailar-dongshan-airport	1	1	0
382	la-rochelle-ile-de-re-airport	1	0	1
131	calcutta-airport	1	0	1
633	springfield-branson-airport	1	1	0
549	preveza-airport	1	0	1
370	kos-airport	1	0	1

	Name	Author	User Rating	Reviews	Price	Year	Genre	Profit
382	The Girl on the Train	Paula Hawkins	4.1	79446	18	2015	Fiction	1430028
338	The Alchemist	Paulo Coelho	4.7	35799	39	2014	Fiction	1396161
534	Where the Crawdads Sing	Delia Owens	4.8	87841	15	2019	Fiction	1317615
70	Diagnostic and Statistical Manual of Mental Di…	American Psychiatric Association	4.5	6679	105	2014	Non Fiction	701295
69	Diagnostic and Statistical Manual of Mental Di…	American Psychiatric Association	4.5	6679	105	2013	Non Fiction	701295
159	Harry Potter Paperback Box Set (Books 1-7)	J. K. Rowling	4.8	13471	52	2016	Fiction	700492
393	The Goldfinch: A Novel (Pulitzer Prize for Fic…	Donna Tartt	3.9	33844	20	2014	Fiction	676880
392	The Goldfinch: A Novel (Pulitzer Prize for Fic…	Donna Tartt	3.9	33844	20	2013	Fiction	676880
33	Becoming	Michelle Obama	4.8	61133	11	2019	Non Fiction	672463
32	Becoming	Michelle Obama	4.8	61133	11	2018	Non Fiction	672463

	Name	Author	User Rating	Reviews	Price	Genre	Profit
Year
2009	Publication Manual of the American Psychologic…	American Psychological Association	4.5	8580	46	Non Fiction	394680
2010	Unbroken: A World War II Story of Survival, Re…	Laura Hillenbrand	4.8	29673	16	Non Fiction	474768
2011	The Hunger Games Trilogy Boxed Set (1)	Suzanne Collins	4.8	16949	30	Fiction	508470
2012	Fifty Shades of Grey: Book One of the Fifty Sh…	E L James	3.8	47265	14	Fiction	661710
2013	Diagnostic and Statistical Manual of Mental Di…	American Psychiatric Association	4.5	6679	105	Non Fiction	701295
2014	The Alchemist	Paulo Coelho	4.7	35799	39	Fiction	1396161
2015	The Girl on the Train	Paula Hawkins	4.1	79446	18	Fiction	1430028
2016	Harry Potter Paperback Box Set (Books 1-7)	J. K. Rowling	4.8	13471	52	Fiction	700492
2017	Player’s Handbook (Dungeons & Dragons)	Wizards RPG Team	4.8	16990	27	Fiction	458730
2018	Becoming	Michelle Obama	4.8	61133	11	Non Fiction	672463
2019	Where the Crawdads Sing	Delia Owens	4.8	87841	15	Fiction	1317615

	Unnamed: 0	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
0	1	5.1	3.5	1.4	0.2	setosa
1	2	4.9	3.0	1.4	0.2	setosa
2	3	4.7	3.2	1.3	0.2	setosa
3	4	4.6	3.1	1.5	0.2	setosa
4	5	5.0	3.6	1.4	0.2	setosa
…	…	…	…	…	…	…
145	146	6.7	3.0	5.2	2.3	virginica
146	147	6.3	2.5	5.0	1.9	virginica
147	148	6.5	3.0	5.2	2.0	virginica
148	149	6.2	3.4	5.4	2.3	virginica
149	150	5.9	3.0	5.1	1.8	virginica

	TV	Radio	Newspaper	Sales
0	230.1	37.8	69.2	22.1
1	44.5	39.3	45.1	10.4
2	17.2	45.9	69.3	12.0
3	151.5	41.3	58.5	16.5
4	180.8	10.8	58.4	17.9

	TV	Radio	Newspaper	Sales
TV	1.000000	0.054809	0.056648	0.901208
Radio	0.054809	1.000000	0.354104	0.349631
Newspaper	0.056648	0.354104	1.000000	0.157960
Sales	0.901208	0.349631	0.157960	1.000000

Machine Learning – Machine Learning Tutorials, Courses and Certifications

Airline Quality Service

Airline Quality Service Analysis¶

Analysing the DataSet¶

Calcluating Total Reviews to Each airline¶

Finding Polarity of Each Review¶

Plotting Polarity¶

Generating Word Cloud¶

Providing Sentiment Value to Each Review According to Rating¶

Sentiment from polarity¶

Logisitc Regression¶

Linear Support Vector Machine¶

**Preprocessing the Reviews **¶

Preprocessing Reviews and Removing Stop Words¶

Logisitic Regression on Processed Data¶

Implementing Linear SVM¶

Working With Neural Networks¶

LSTM¶

CNN¶

**Features Passengers Concerned About **¶

Airport Quality Service

Analysing Airport Review DataSets¶

Counting Recommendations¶

Calculating the Polarity of Each Review¶

Plotting the Polarity¶

Amazon Best Selling Books Analysis

Data Science Project on Amazon Best Selling Books Analysis with the Python Programming Language¶

Task 1: Reading and Inspection¶

Subtask 1.1: Import and Read¶

Importing Libraries¶

Loading the Dataset¶

Subtask 1.2: Inspect the dataframe¶

Check Shape¶

Check Datatype of Each Column¶

Detail Information about Dataset¶

Check The Total Unique Values In Each Column¶

Get An Overview Of The Dataframe¶

Task 2: Cleaning the Data¶

Subtask 2.1: Inspect Null values¶

Column-Wise Null Count¶

Row-Wise Null Count¶

Subtask 2.2: Check if the dataset has some duplicate values?¶

Check Total number of different books¶

Check The Duplicate Books¶

Task 3: Data Analysis¶

Subtask 3.1: Find the Book with highest profit¶

Subtask 3.2: Drop duplicate values¶

Subtask 3.3: Which Genre Has The Most Books In This Category and Their Distribution?¶

Subtask 3.4: What is The Average Rating of Each Genre and Plot Histogtam of each Rating¶

Subtask 3.5: Find the Relationship of Ratings with Time.¶

Subtask 3.6: Find Relationship Between Ratings and Price¶

Plot Data and Regression Model Fits Across a FacetGrid.¶

Subtask 3.6: Find How Much Books Have Earned Yearly.¶

Plot using Seaborn¶

Subtask 3.7: Average Profit Earned by Each Book Depending on its Genre.¶

Subtask 3.8:. Find Books Which Earned The Most Per Year (2009-19)¶

Subtask 3.9: What is the Average Earning of Each Genre on per Year Basis¶

Subtask 3.10: Trend of books selling (Top rated) over the years¶

Subtask 3.11: Top 10 authors which earned the most¶

XGBoost Classifier – Practically

XGBoost Classifier – Breast Cancer Dataset

Install XGBoost Python Package

Import Libraries

Load Dataset

Train Machine Learning Model

IRIS Dataset

By default algorithm selects objective=’multi:softprob’ as a multiclass classifier

XGBoost Regression – Practically

Introduction to XGBoost Regression

1. Key Concepts of XGBoost Regression:¶

2. Parameters in XGBoost Regression:¶

3. How XGBoost Regression Works:¶

4. Evaluation Metrics for Regression:¶

5. Hyperparameter Tuning for XGBoost Regression:¶

6. Basic Workflow for XGBoost Regression:¶

7. Use Cases for XGBoost Regression:¶

Advantages of XGBoost Regression:¶

Preprocessing the Reviews ¶

Features Passengers Concerned About ¶