Boosting Techniques in Machine Learning

  • date 15th February, 2020 |
  • by Prwatech |
  • 0 Comments

 

Boosting Techniques in Machine Learning

 

Boosting Techniques in Machine Learning, in this Tutorial one can learn Boosting algorithm introduction. Are you the one who is looking for the best platform which provides information about different types of boosting algorithm? Or the one who is looking forward to taking the advanced Data Science Certification Course from India’s Leading Data Science Training institute? Then you’ve landed on the Right Path.

 

Machine learning is vital for many of the technologies that seek to provide intelligence to the data, and companies recognize the great value. Boosting is a meta-algorithm joint learning machine to mainly reduce bias , and also variation in supervised learning , and a family of machine learning algorithms that convert students’ weaknesses to strengths.

 

The Below mentioned Tutorial will help to Understand the detailed information about boosting techniques in machine learning, so Just follow all the tutorials of India’s Leading Best Data Science Training institute in Bangalore and Be a Pro Data Scientist or Machine Learning Engineer.

 

Boosting Algorithm Introduction

 

As seen in introduction part of ensemble methods, boosting is one of the advanced ensemble methods which improve overall performance by decreasing bias. Boosting is a consecutive process, where each succeeding model attempts to correct the errors of the preceding model.

 

Different Types of Boosting Algorithm

 

There are mainly five types of boosting techniques.

AdaBoost

Gradient Boosting (GBM)

Light GBM

XGBoost

CatBoost

Let’s see more about these types.

 

AdaBoost Algorithm in Machine Learning

 

One of the simplest boosting algorithms is AdaBoost. Usually, decision trees are used for modeling. Here multiple sequential models are created. Each model corrects the errors from the last model. It assigns weights to the observations which are incorrectly predicted. The succeeding model works to predict these values correctly.  Ada-boost classifier combines weak classifier algorithms to form strong classifier. Mathematical equation for AdaBoost can be represented as follows:

 

formula for AdaBoost Algorithm in Machine Learning

 

where

f= mth weak classifier

m= corresponding weight.

It is exactly the weighted combination of M weak classifiers. The steps for performing the AdaBoost algorithm are as follows:

 

Initially, equal weights are assigned to all data points in dataset.

A subset of data is used to build model.

Predictions are made based on this model, for whole dataset.

By comparing the predictions and actual values the errors are measured.

In creation of next model, the data point which are predicted incorrectly, are assigned with higher weights.

With the help of error value weights can be determined. Means if error is high then assigned weight is also high for corresponding data point.

Until the error function does not change, or the maximum boundary of the number of estimators is reached, this process is repeated.

 

Now let’s take one example of employee attrition.

 

Initializing and importing libraries

 

Import pandas as pd

Import numpy as np

 

Reading File

 

df=pd.read_csv(“Your File Path”)

df.head()

 

Splitting dataset into train and test

 

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.3, random_state=0)

x_train=train.drop(‘status’,axis=1)

y_train=train[‘status’]

x_test=test.drop(‘status’,axis=1)

y_test=test[‘status’]

 

Applying AdaBoost:

 

from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(random_state=1)

model.fit(x_train, y_train)

model.score(x_test,y_test)

 

Output:

 

0.9995555555555555

 

Note: In case of regression, the steps will be same. Just we have to replace AdaBoostClassifier with AdaBoostRegressor .

 

Gradient Boosting (GBM) in Machine Learning:

 

Gradient Boosting or GBM is an ensemble machine learning algorithm which works on both regression and classification problems. In GBM, a number of weak learners are combined to form a strong algorithm. Here also each succeeding tree is built on the error calculation basis from preceding tree. Here regression trees are used as a base learner. Let’s take one example to understand working of this technique. We have to predict the age of person.

 

Gender Height Weight BMI Physical Activity Age
M 160 85 33.20 1 35
F 155 64 26.64 0 27
M 170.7 95 28.14 0 28
F 185.4 65 23.27 1 28
F 158 70 36.05 1 32
F 155 90 35.38 0 28
M 173.7 72 23.86 1 22
F 161.5 74 23.77 0 33

 

The mean age is supposed to be the predicted value (which is indicated by‘Predicition1’) for all observations in the dataset. The difference between mean age and actual values of age is considered as errors, which is indicated by ‘ Error 1’.

 

Gender Height Weight BMI Physical Activity Age Mean Age Error
M 160 85 33.20 1 35 29 6
F 155 64 26.64 0 27 29 -2
M 170.7 95 28.14 0 28 29 -1
F 185.4 65 23.27 1 28 29 -1
F 158 70 36.05 1 32 29 3
F 155 90 35.38 0 28 29 -1
M 173.7 72 23.86 1 22 29 -7
F 161.5 74 23.77 0 33 29 4

 

Using Error 1 calculation, a tree model is designed. Purpose is to reduce that error to 0.

 

Gender Physical Activity Age Mean Age(prediction 1) Error1 Prediction 2 Mean + prediction 2 
M 1 35 29 6 4 33
F 0 27 29 -2 -1 28
M 0 28 29 -1 -1 29
F 1 28 29 -1 0 29
F 1 32 29 3 1 30
F 0 28 29 -1 1 30
M 1 22 29 -7 -2 27
F 0 33 29 4 3 32

 

The error values are modified with some prediction. These are indicated by ‘Prediction 2’. If we add previously calculated mean and prediction 2, we should get value of ‘Age’ approaching to actual Age. That means the error or difference between the actual value and predicted value should be decreased. Similarly, in each iteration the residue taken for current prediction is used to predict next stage output. This process is repeated till maximum number of iterations reach. Now let’s see example.

 

We will take the same example as AdaBoosting technique. We will apply GBM on training and testing sets of the dataset.

 

from sklearn.ensemble import GradientBoostingClassifier

model= GradientBoostingClassifier(learning_rate=0.01,random_state=1)

model.fit(x_train, y_train)

model.score(x_test,y_test)

 

Output:

 

accuracy_score on test dataset :  0.7595555555555555

 

Note: In case of regression, the steps will be same. Just we have to replace GradientBoostingClassifier with GradientBoostingRegressor .

 

Light GBM Machine Learning:

 

When dataset is extremely large, Light GBM is most useful in that case. This technique is faster in terms of running huge data, compared to the other algorithms. Unlike other level- wise approach following algorithms, it uses tree-based algorithm and follows leaf-wise approach. Leaf-wise development may cause over-fitting on smaller datasets which can be handled by parameter named ‘max_depth’.

 

We will apply Light GBM train data set and model will predict for test dataset.

 

First install function

pip install lightgbm

 

Applying algorithm to data sets

 

import lightgbm as lgb

train_data=lgb.Dataset(x_train,label=y_train)

 

#defining parameters

 

params = {‘learning_rate’:0.001}

model= lgb.train(params, train_data, 100)

y_pred=model.predict(x_test)

for i in range(0,4500):    # 4500 is number of datapoints iy_pred

if y_pred[i]>=0.5:

y_pred[i]=1

else:

y_pred[i]=0

 

# Accuracy Score on test dataset

 

acc_test = accuracy_score(y_test,y_pred)

print(‘\naccuracy_score on test dataset : ‘, acc_test)

 

Output:

 

0.958

 

XGBoost in Machine Learning:

 

XGBoost means extreme Gradient Boosting. It is one of the advanced implementations of the gradient boosting algorithm. XGBoost is nearly 10 times faster compared to other, and it has high predictive power. It also helps in reducing overfitting issues and improves overall accuracy of model. It is also called ‘regularized boosting’ technique. On the same dataset we will apply the XGBoost technique.

 

Import that library

 

pip install xgboost

 

Applying XGBoost on next data set.

 

import xgboost as xgb

model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)

model.fit(x_train, y_train)

model.score(x_test,y_test)

 

Output:

 

0.9995555555555555

 

CatBoost in Machine Learning:

 

It is difficult to handle large amount of labeled data in case of the categorical values. Means if data is having many categorical variables then it is difficult to process the label encoding operation. In that case ‘CatBoost’ can handle categorical variables and does not require extensive data preprocessing like other ML models. Let’s apply the technique to same dataset.

 

Import library

 

pip install catboost

 

Applying CatBoost

 

from catboost import CatBoostClassifier

model=CatBoostClassifier()

categorical_features_indices = np.where(df.dtypes != np.float)[0]

model.fit(x_train,y_train,cat_features=([0,1,2,3,4]),eval_set=(x_test, y_test))

model.score(x_test,y_test)

 

Output:

 

0.9991111111111111

 

 

We hope you understand boosting techniques in machine learning.Get success in your career as a Data Scientist by being a part of the Prwatech, India’s leading Data Science training institute in Bangalore.

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Quick Support

image image