
What is XGBoost?
XGBoost stands for eXtreme Gradient Boosting. XGboost is the most widely used algorithm in machine learning, whether the problem is a classification or a regression problem.
Even when it comes to machine learning competitions and hackathon, XGBoost is one of the excellent algorithms that is picked initially for structured data. It has proved its determination in terms of speed and performance.
XGBoost is a software library that you can download and install on your machine, then access from a variety of interfaces. Specifically, XGBoost supports the following main interfaces:
- A wide range of applications: Can be used to solve regression, classification, ranking, and user-defined prediction problems.
- Portability: Runs smoothly on Windows, Linux, and OS X.
- Languages: Supports all major programming languages including C++, Python, R, Java, Scala, and Julia.
- Cloud Integration: Supports AWS, Azure, and Yarn clusters and works well with Flink, Spark, and other ecosystems.
The two reasons to use XGBoost are also the two goals of the project:
- Execution Speed : Parallelization (If 1 Procesor try to utilize all the core)
- Model Performance : Cache Optimization
- Out of Memory Computation (Handle large data more than ram capacity)
1. Decision Tree: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria.
2. Bagging: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process.
3. Random Forest: It is a bagging-based algorithm with a key difference wherein only a subset of features is selected at random. In other words, every interviewer will only test the interviewee on certain randomly selected qualifications (e.g. a technical interview for testing programming skills and a behavioral interview for evaluating non-technical skills).
4. Boosting: This is an alternative approach where each interviewer alters the evaluation criteria based on feedback from the previous interviewer. This ‘boosts’ the efficiency of the interview process by deploying a more dynamic evaluation process.
Boosting is nothing but ensemble techniques where previous model errors are resolved in the new models. These models are added straight until no other improvement is seen. Gradient boosting is a method where the new models are created that computes the error in the previous model and then leftover is added to make the final prediction.
A parameter that validates the learning process of the booster.
Objective[default=reg:linear]
- Reg: linear – It is used for linear regression.
- Binary: logistic – It is used for logistic regression for binary classification that returns the class probabilities.
- Multi: softmax – It is used for multi-classification using softmax that returns predicted class labels.
- Multi: softprob – It is used for multi-classification using softmax that returns predicted class probabilities.