Feature Selection Tutorial
Feature Selection Tutorial, in this tutorial, we will learn the introduction to feature selection and types of feature selection methods. Here, You will also learn various types of feature selection methods that are helpful for any Machine learning Engineer. Are you looking for the feature selection tutorial with examples or Are you dreaming to become to certified Pro Machine learning Engineer, then stop just dreaming, get your Data Science certification course from India’s Leading Data Science training institute. Do you want to know about the advantages of feature selection, then just follow the below-mentioned feature selection tutorial for Beginners from Prwatech and take advanced Machine Learning training like a Pro from today itself under 10+ years of hands-on experienced Professionals.Introduction to Feature Selection
It is convenient to build any Machine Learning model with limited variables. But nowadays, datasets are having variety of fields and variables and we need to choose which variables are most contributing in giving expected. It is also called High Dimensionality data. There are two main reasons why we need to select particular features excluding all other variables:Garbage in, garbage out:
If you input a lot of stuff into your model then your model would not be a good model. It will not be reliable; it will not be doing what it’s supposed to be. The output can be considered as garbage.Too many variables:
At the end of the day, you’re going to have to explain these variables and understand them. If you have thousands of variables, it is not practically possible to do with all. You want to keep only those which are very important and contribute in actually predicting something.Feature Selection Definition:
Feature Selection is a procedure to select the features (i.e. independent variables) automatically or manually those are more significant in terms of giving expected prediction output. Feature Selection is one amongst the core concepts in machine learning which massively affects the performance of a model. Having irrelevant features in your dataset can decrease the accuracy of models and make your model learn based on irrelevant features.data:image/s3,"s3://crabby-images/2feef/2feef426112af1155085bc620edc00278e25c1da" alt="Feature selection tutorial"
Benefits of Feature Selection:
Reduces Over fitting: Less redundant data means less chance to make decisions based on noise. Improves Accuracy: Less misleading data means modelling accuracy improvement. Reduces Training Time: Fewer data points reduce algorithm complexities and algorithms train faster.Types of Feature Selection Methods:
Feature selection can be done in multiple ways but there are broadly 3 categories of it:Filter Method
Wrapper Method
Embedded Method
Filter Method:
data:image/s3,"s3://crabby-images/a1475/a14752e8e2d3d5cdb48c873d15c2fe3177730b12" alt="Feature Selection Tutorial - Filter Method"
Filter Method Example:
First we will load boston dataset from sklearn with this command from sklearn.datasets import load_boston All independent variables are stored in df, and dependant variable MHDV is stored in y. Here we will see the Heatmap using Pearson correlation as follows: # Using Pearson Correlation plt.figure(figsize=(12,10)) cor=df.corr() sns.heatmap(cor, annot=True, cmap=plt.cm.Reds) plt.show() It will show the result in form of heat map as follows:data:image/s3,"s3://crabby-images/026f8/026f8516b4bc537256b484e71806c9daef2fd621" alt="feature selection machine learning"
Result:
data:image/s3,"s3://crabby-images/cf6be/cf6be402069af2d2072e4ff39dc766bf281d8655" alt="Filter Method Example"
Result:
data:image/s3,"s3://crabby-images/c5329/c5329e9fe1ab2d5db33572535150c5628d862164" alt="feature selection methods for classification"
Wrapper Method
data:image/s3,"s3://crabby-images/5a3e8/5a3e8830c6a016afe9bb3a382e61818d76a9968c" alt="Wrapper Method"
A wrapper method needs one machine learning algorithm and performance of algorithm is used as evaluation criteria. i.e. you feed the features to the selected Machine Learning algorithm and based over the model performance you add or remove the features. It is an iterative and computationally expensive process Still it is more accurate than the filter method.
Forward Selection
It is an iterative method in which initially we have to start with selection of single feature. By observing overall performance we can add the next feature in model. We have to add features till we get best result. By adding features we try to increase performance of model.Backward Elimination
Here we have to start training model by adding all features and removing one feature in next iteration. It is exactly opposite to forward elimination. The least significant feature will be removed in every iteration to get improved accuracy. Until improvement stops we have to remove features one by one.Backward Elimination Example
First import the boston data from dataset from sklearn.datasets import load_boston import pandas as pd import numpy as np import statsmodels.api as sm from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression x=load_boston() df=pd.DataFrame(x.data, columns = x.feature_names) df["MEDV"]=x.target x=df.drop("MEDV",axis=1) y=df[‘MEDV’] Till now we decided independent and dependant variables. Now we will apply algorithm and backward elimination process: # BACKWARD ELIMINATION: x_1=sm.add_constant(x) model=sm.OLS(y,x_1).fit() model.summary() By checking model summary we can eliminate features having ‘P’ value greater than 0.05. Otherwise you can write python code as follows: cols=list(x.columns) pmax=1 while(len(cols)>0): p=[] x_1=x[cols] x_1=sm.add_constant(x_1) model=sm.OLS(y,x_1).fit() p=pd.Series(model.pvalues.values[1:],index=cols) pmax=max(p) feature_with_p_max=p.idxmax() if(pmax>0.05): cols.remove(feature_with_p_max) else: break selected_features_BE=cols print(selected_features_BE) (Here 11 features are selected from 13 features).Recursive Feature Elimination
It is wrapped method’s algorithm which tries to find the best contributing feature subset. It repetitively creates models and tracks the best or the worst performing feature at each iteration. It designs the next model with the remaining features until all the features are drained. It then ranks the features based on elimination order.Recursive Feature Elimination Example
We will apply it on same Boston data set. So the process of initialization and storing all variables will be same. rfe=RFE(model, nof_list[n]) high_score=0 nof=0 score_list=[] for n in range(len(nof_list)): X_train, X_test, y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0) model=LinearRegression() rfe=RFE(model, nof_list[n] x_train_rfe=rfe.fit_transform(X_train,y_train) x_test_rfe=rfe.transform(X_test) model.fit(x_train_rfe,y_train) score=model.score(x_test_rfe,y_test) score_list.append(score) if(score>high_score): high_score=score nof=nof_list[n] print("Optimum number of features: %d" %nof) print("Score with %d features: %f" %(nof,high_score))Embedded Method
data:image/s3,"s3://crabby-images/7b473/7b473bccfa5fc19a6663cfb10e1a77525fb549c7" alt="Feature Selection Method - Embedded Method"
LASSO:
As shown in previous example load Boston file import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline from sklearn.linear_model import LassoCV, Lasso reg=LassoCV() reg.fit(x,y) print("Best alpha using built-in LassoCV: %f" % reg.alpha_) print("Best score using built-in LassoCV: %f" % reg.score(x,y)) coef=pd.Series(reg.coef_,index=x.columns)Result
print("Lasso picked " + str(sum(coef !=0)) + " variables and eliminated the others " +str(sum(coef==0)) + " varibles") Here we are checking how many features selected by printing the statementdata:image/s3,"s3://crabby-images/1e633/1e6332caf24c28b276d0392c4c5c2089154bc0ab" alt="Feature Selection Tutorial - Lasso model"
data:image/s3,"s3://crabby-images/800a8/800a8bb48f09a8d503c7abd3d484f48da646961f" alt="Embedded Method - Lasso Model"
RIDGE:
In following example, feature selection is performed using Ridge regularization. As shown in previous example load boston file import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline from sklearn.linear_model import RidgeCV, Ridge reg=RidgeCV() reg.fit(x,y) print("Best alpha using built-in RidgeCV: %f" % reg.alpha_) print("Best score using built-in RidgeCV: %f" % reg.score(x,y)) coef=pd.Series(reg.coef_,index=x.columns) coefdata:image/s3,"s3://crabby-images/b5d50/b5d50f3bffc5e23acb0a892a75ebf8ab75f2a2b2" alt="Feature Selection Tutorial - Ridge Model"
Result:
print("Ridge picked " + str(sum(coef !=0)) + " variables and eliminated the others " +str(sum(coef==0)) + " varibles") Here we are checking how many features selected by printing the statement. Ridge picked 13 variables and eliminated the others 0 variables. (Note: The result can be changed according to dataset.) To get the graphical output we can add following lines in code: imp_coef=coef.sort_values() imp_coef.plot(kind="bar") plt.title("Feature importance using Ridge Model")data:image/s3,"s3://crabby-images/bdc2e/bdc2e42e6ec079323870053837d4aac6eb4dfd16" alt="Embedded Method - Ridge Model"