Support Vector Machine(SVM) is the most popular and powerful supervised machine learning algorithm which is used for both classification and regression.
However, it is more popular and extensively used in addressing the classification problems of both linear and non-linear data.
In this article, we’ll focus on the classification setting only.
SUPPORT VECTOR MACHINE(SVM):
Given a set of points, SVM tries to estimate the optimal hyperplane that best separates the classes of data.
However, there can be an infinite number of hyperplanes, how can we find the ideal hyperplane?
Now from the above diagram, we can see that all three hyperplanes do an excellent job in separating the two classes.
However, intuitively if we see the green hyperplane best separates the two classes as the distance to the nearest element of each of the two classes is largest.
The other two hyperplanes also classify the two classes accurately, but they too are close to the points.
A hyperplane which is too close to the points may be too sensitive to noise, and a new point may be misclassified even if it falls slightly right or left to the hyperplane.
The explicit goal of an SVM is to maximize the margins between data points of one class and the other.
Hence, Support Vector Machine also referred to as maximum margin classifier.
SVM chooses the optimal hyperplane such that it maximizes the separation of classes as wide as possible.
The space between the dotted vertical lines is the margin.
The points that touch the margins are called support vectors which determines the position of the hyperplane.
The hyperplane which is constituted by the nearest point(s) on the positive side parallel to the optimal hyperplane is called a positive hyperplane; on the other hand, the negative hyperplane is the one which is constituted by the nearest point(s) on the negative side.
The optimal hyperplane is the vertical line running in the middle of the margin.
WHAT IF THE DATA IS NOT LINEARLY SEPARABLE?
Consider the following diagram
As we can see clearly, we cannot draw a line and separate the two classes. This is an example of non-linear data.
If we try to separate the two classes by drawing a line, it would not be properly separated. Any linear classifier would perform poorly in the preceding situation.
In such cases, SVM uses something called a kernel trick to get the job done.
A kernel function is used to map the input data onto a higher dimensional space where it is linearly separable.
The preceding diagram is an example of how a non-linear 2-dimensional data is transformed into a 3-dimensional space using a kernel function where it is linearly separable.
There are different types of kernel functions like linear, polynomial, Gaussian, sigmoid kernel etcetera.
Out of which the most popular is the Gaussian kernel, also known as radial basis function(RBF).
When we don’t know which kernel function to use gaussian kernel will be a good choice.
Let’s do some coding.
from sklearn import svm
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
import numpy as np
X, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state =42)
clf = svm.SVC(kernel='linear', C = 1.0)
y_pred = clf.predict(X_test)
from sklearn.metrics import accuracy_score
The following is the decision surface of the classifier
In this tutorial, we discussed support vector machine, which is one of the powerful supervised learning technique used for classification.
We discussed how SVM finds the ideal hyperplane which has the largest margin between the two classes.
We also discussed how SVM classifies the non-linear data using the kernel trick.
I hope this article has given you a basic understanding of how SVM works without going into the mathematical details.