Introduction to XGBoost

1. Introduction to XGBoost

What is XGBoost?

XGBoost stands for eXtreme Gradient Boosting. XGboost is the most widely used algorithm in machine learning, whether the problem is a classification or a regression problem.

Even when it comes to machine learning competitions and hackathon, XGBoost is one of the excellent algorithms that is picked initially for structured data. It has proved its determination in terms of speed and performance.

XGBoost is a software library that you can download and install on your machine, then access from a variety of interfaces. Specifically, XGBoost supports the following main interfaces:

A wide range of applications: Can be used to solve regression, classification, ranking, and user-defined prediction problems.
Portability: Runs smoothly on Windows, Linux, and OS X.
Languages: Supports all major programming languages including C++, Python, R, Java, Scala, and Julia.
Cloud Integration: Supports AWS, Azure, and Yarn clusters and works well with Flink, Spark, and other ecosystems.

The two reasons to use XGBoost are also the two goals of the project:

Execution Speed : Parallelization (If 1 Procesor try to utilize all the core)
Model Performance : Cache Optimization
Out of Memory Computation (Handle large data more than ram capacity)

Difference Between Decision Tree, Bagging, Random Forest, Boosting, Gradient Boosting and XGBoost

Example of Decision Tree

Example of Random Forest

Lets understand this with Practical Example

1. Decision Tree: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria.

2. Bagging: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process.

3. Random Forest: It is a bagging-based algorithm with a key difference wherein only a subset of features is selected at random. In other words, every interviewer will only test the interviewee on certain randomly selected qualifications (e.g. a technical interview for testing programming skills and a behavioral interview for evaluating non-technical skills).

4. Boosting: This is an alternative approach where each interviewer alters the evaluation criteria based on feedback from the previous interviewer. This ‘boosts’ the efficiency of the interview process by deploying a more dynamic evaluation process.

Boosting is nothing but ensemble techniques where previous model errors are resolved in the new models. These models are added straight until no other improvement is seen. Gradient boosting is a method where the new models are created that computes the error in the previous model and then leftover is added to make the final prediction.

A parameter that validates the learning process of the booster.

Objective[default=reg:linear]

Reg: linear – It is used for linear regression.
Binary: logistic – It is used for logistic regression for binary classification that returns the class probabilities.
Multi: softmax – It is used for multi-classification using softmax that returns predicted class labels.
Multi: softprob – It is used for multi-classification using softmax that returns predicted class probabilities.

In [ ]:

Machine Learning Tutorials, Courses and Certifications

Introduction to XGBoost

Related Articles

What is XGBoost?

XGBoost stands for eXtreme Gradient Boosting. XGboost is the most widely used algorithm in machine learning, whether the problem is a classification or a regression problem.

XGBoost is a software library that you can download and install on your machine, then access from a variety of interfaces. Specifically, XGBoost supports the following main interfaces:

The two reasons to use XGBoost are also the two goals of the project:

Difference Between Decision Tree, Bagging, Random Forest, Boosting, Gradient Boosting and XGBoost

Example of Decision Tree

Example of Random Forest

Lets understand this with Practical Example

1. Decision Tree: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria.

2. Bagging: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process.

4. Boosting: This is an alternative approach where each interviewer alters the evaluation criteria based on feedback from the previous interviewer. This ‘boosts’ the efficiency of the interview process by deploying a more dynamic evaluation process.

Related

About Machine Learning

Check Also

Introduction to XGBoost Classifier

Leave a Reply Cancel reply

OpenCV Python Project for Bus Detection from an Image

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Python Loops

Machine Learning with Python Cognitive Class Exam Answers:-

Text Analytics 101 Cognitive Class Exam Answers:-

Best Data Science Course Training Institute in Jalandhar

Practically – ANN

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service