# Support Vector Machine Tutorial for Beginners

**Support Vector Machine Tutorial for Beginners**, Are you the one who is looking forward to know about What is Support Vector Machine?? Or the one who is looking forward to know How does SVM work? and implementing svm in python or Are you dreaming to become to certified Pro Machine Learning Engineer or Data Scientist, then stop just dreaming, get your Data Science certification course with Machine Learning from India’s Leading Data Science training institute.

Support Vector Machine is another simple algorithm that every machine learning expert uses. It is highly preferred by many experts because it provides accuracy results with less computation power and is used for both classification and regression problems. In this blog, we will learn How does SVM work in Machine Learning and implementing svm in python.Do you want to know What is Support Vector Machine, So follow the below mentioned support vector machine tutorial for beginners from Prwatech and take advanced Data Science training with Machine Learning like a pro from today itself under 10+ Years of hands-on experienced Professionals.

## Introduction to Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm which is like a discriminative classifier defined by a separating hyperplane. Means, for labeled training data the algorithm designsthebest hyperplane which categorizes new inputs. The hyperplane in two-dimension is a line dividing space into two parts. Support Vectors are the co-ordinates of distinct observations.

## How does SVM work?

There are number of possible hyperplanes that could be chosen to separate the two classes of data points. The hyperplane having maximum margin must be chosen. Here maximum margin means the maximum distance between closest data points of both classes. To identify the accurate hyperplane there are some criteria.

### Criterion 1

Following image shows three hyperplanes trying to separate out two classes.

Here we have to choose that hyperplane which segregate two classes. We can see hyperplane X fulfills this criterion.

### Criterion 2

Here all hyperplanes are separating two classes, now the question is how to identify correct one?

Here we have to consider the maximum distance between the nearest data points in both classes and the hyperplane. This distance is called **‘Margin’**. In above diagram plane P1 is having maximum distance from nearest points in both classes.

### Criterion 3

In this criterion if we choose hyperplane P2 according to higher margin than P1, it misclassified the data points.So, hyperplane P2 has classification errors but hyperplane P1 can classify correctly.

### Criterion 4:

What if the classes are distributed as shown in above diagram? SVM has property to ignore the outliers. It is robust algorithm in case of outliers.

### Criterion 5:

Now how to handle this criterion, this is challenge in using single line as hyperplane. SVM handles this problem by using additional features. It can use third plane Z, besides X and Y planes having equation like

z = x^2+y^2

AS we plot the data points across X-Z planes we get above diagram which clearly shows the segregation of two classes. SVM can handle separation of different types of data points with appropriate hyperplanes. In SVM model some parameters are required defined to be tuned for efficient working of that model.

## Tuning parameters

### Kernel:

It is important factor which decides nature of hyperplane. Different types of kernels present in model design. In Linear Kernel, the equation for prediction of new data point is done by the dot product between the input (x). In this each support vector (Xi) is calculated as follows:

f(x) = B(0) + sum(ai * (x,Xi))

The equation calculates the inner products of a new input vector x with all support vectors in training data. B0 and ai coefficients for, each input, must be assessed from the training data by learning algorithm. The polynomial kernel can be written as

K(x,xi) = 1 + sum(x * Xi)^d

And exponential can be written as

K(x,xi) = exp(-gamma * sum((x — xi²))

### Regularization

The Regularization parameter is also called parameter ‘C’ in sklearn library. It is helpful in SVM optimization to decide at what extent the misclassification can be avoided.

For greater values of C, a smaller-margin hyperplane is selected by optimizer, if that hyperplane performs better in classifying all the training points correctly. Whereas, for smaller value of C the optimizer will select a larger-margin separating hyperplane, even if that hyperplane misclassified more points.

First image shows the case of lower regularization where chances of misclassifications are more. In second image high regularization will help to classify data points correctly compared to first case.

### Gamma

The gamma parameter defines how far the impact of a single training point reaches, with low values (meaning ‘far’) and high values (meaning ‘close’). In other words, with low gamma, points far away from reasonable separation line are taken in calculation for the separation line. And in case of higher gamma value, the points close to plausible line are taken in calculation.

### Margin

A margin is a separation of line to the nearest points of classes. For a good Margin, the separation is larger for both classes. A good margin permits the points to be in their individual classes without crossing other class. Let’s see example of it.

In first example we can see the margin is not proper as it is very close to first class having circles. But in second example the margin is good as it maintains larger distance from nearest points in both classes.

## Support Vector Machine Example

In first example we can see the margin is not proper as it is very close to first class having circles. But in second example the margin is good as it maintains larger distance from nearest points in both classes.

Let’s see how we can implement this in python and what are the effects of different parameters on result: For this we will take data set from scikit library.

### Initializing and importing required libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn import svm, datasets

### Let’s take iris data set and load it. We will check features in iris and we will select two features from them to avoid complex visualization.

iris = datasets.load_iris()

print(iris.feature_names)

Xin = iris.data[:, :2] # we will take ‘sepal width’ and ‘sepal length’.

yin = iris.target

We will try to plot support vectors using linear kernel, polynomial kernel, and sigmoid kernel. For this we will keep C and gamma values constant.

### Let’s apply **linear kernel** first:

C = 1.0 # Support Vector Machine regularization parameter

svc = svm.SVC(kernel=’linear, C=1,gamma=’auto’).fit(X, y)

x_min, x_max = Xin[:, 0].min() – 1, Xin[:, 0].max() + 1

y_min, y_max = Xin[:, 1].min() – 1, Xin[:, 1].max() + 1

h = (x_max / x_min)/100

x1, y1 = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))

plt.subplot(1, 1, 1)

Z = svc.predict(np.c_[x1.ravel(), y1.ravel()])

Z = Z.reshape(x1.shape)

plt.contourf(x1, y1, Z, cmap=plt.cm.Paired, alpha=0.8)

plt.scatter(Xin[:, 0], Xin[:, 1], c=yin, cmap=plt.cm.Paired)

plt.xlabel(‘Sepal length’)

plt.ylabel(‘Sepal width’)

plt.xlim(x1.min(), x1.max())

plt.title(‘SVC – linear kernel’)

plt.show()

**Output:**

Now we have to see the results of different kernels. By keeping rest code similar we have to just change type of kernel as follows

### Let’s apply poly kernel:

svc = svm.SVC(kernel=’poly’, C=1,gamma=’auto’).fit(X, y)

**Output:**

### Let’s apply rbf kernel:

svc = svm.SVC(kernel=’rbf’, C=1,gamma=’auto’).fit(X, y)

**Output:**

From above graphical representation it is clearly observed that change in kernel will change the contour in image and try to classify data points in different manner to approach the correct classification.

### Now Let’s observe effect of ‘gamma’ on classification. For that we will keep SVM kernel and C constant.

If we change values of gamma as 1,10,200 then we get respective graph as follows:

### Now Let’s observe effect of ‘C’ on classification. For that we will keep gamma and SVM kernel constant.

If we change values of C as 5,500,5000 then we get respective graph as follows:

We hope you understand support vector machine tutorial for beginners.Get success in your career as a Data Scientist by being a part of the Prwatech, India’s leading Data Science training institute in Bangalore.