Pages

Thursday, 21 February 2019

Implement SVM Algorithm in Python using Scikit Learn Library

Lets implement SVM algorithm in Python using Scikit Learn library. There are four types of kernels in SVM which we will implement in this article:

1. Linear Kernel 
2. Polynomial Kernel
3. Gaussian Kernel
4. Sigmoid Kernel

Linear and Sigmoid Kernels are used to solve binary classification problems while Polynomial and Gaussian Kernels are used to solve multi-class classification problems.

In this article, we will first try to solve a binary classification problem (using bank note authentication dataset) using Linear and Sigmoid Kernel and then will solve multi-class classification problem (using IRIS dataset) using Gaussian and Polynomial Kernel.

You can download bank_note_authentication.csv from here and iris.csv from here. You can also download my following Jupyter notebooks containing below code of SVM implementation:

Jupyter notebook for SVM Linear Kernel
Jupyter notebook for SVM Polynomial Kernel

Binary Classification using Linear Kernel

Step 1: Import the required Python libraries like pandas and sklearn

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC 
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

Step 2: Load and examine the dataset

names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()

Step 3: Mention X and Y axis

X = dataset.drop('Class', axis=1)  
y = dataset['Class']

Step 4: Split the dataset into training and testing dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0) 

Step 5: Create and fit the model

model = SVC(kernel='linear')  
model.fit(X_train, y_train)  

Step 6: Predict from the model

y_pred = model.predict(X_test)

The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.

Lets see the difference between the actual and predicted values.

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})  
df  

Step 7: Check the accuracy

confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)

Binary Classification using Sigmoid Kernel

All the above steps are same except Step 5. Just change the Kernel type from linear to sigmoid.

model = SVC(kernel='sigmoid') 

Multi-class Classification using Polynomial Kernel

All the above steps are same except Step 2 and 5. 

In Step 2, we will load IRIS dataset (which is multi-class dataset) instead of bank note authentication dataset (which is binary classification dataset).

names = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Class']
dataset = pd.read_csv('iris.csv', names=names) 

In Step 5, just change the Kernel type from linear to poly. Also mention the degree of polynomial.

model = SVC(kernel='poly', degree=8) 

Multi-class Classification using Gaussian Kernel

Same as Polynomial Kernel, just replace the Kernel type from poly to rbf in Step 5. rbf stands for Radial Basis Function.

model = SVC(kernel='rbf')

No comments:

Post a Comment