Lets implement SVM algorithm in Python using Scikit Learn library. There are four types of kernels in SVM which we will implement in this article:
1. Linear Kernel
2. Polynomial Kernel
3. Gaussian Kernel
4. Sigmoid Kernel
Linear and Sigmoid Kernels are used to solve binary classification problems while Polynomial and Gaussian Kernels are used to solve multi-class classification problems.
In this article, we will first try to solve a binary classification problem (using bank note authentication dataset) using Linear and Sigmoid Kernel and then will solve multi-class classification problem (using IRIS dataset) using Gaussian and Polynomial Kernel.
You can download bank_note_authentication.csv from here and iris.csv from here. You can also download my following Jupyter notebooks containing below code of SVM implementation:
Jupyter notebook for SVM Linear Kernel
Jupyter notebook for SVM Polynomial Kernel
Binary Classification using Linear Kernel
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.drop('Class', axis=1)
y = dataset['Class']
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)
Step 5: Create and fit the model
model = SVC(kernel='linear')
model.fit(X_train, y_train)
Step 6: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 7: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
Binary Classification using Sigmoid Kernel
All the above steps are same except Step 5. Just change the Kernel type from linear to sigmoid.
model = SVC(kernel='sigmoid')
Multi-class Classification using Polynomial Kernel
All the above steps are same except Step 2 and 5.
In Step 2, we will load IRIS dataset (which is multi-class dataset) instead of bank note authentication dataset (which is binary classification dataset).
names = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Class']
dataset = pd.read_csv('iris.csv', names=names)
In Step 5, just change the Kernel type from linear to poly. Also mention the degree of polynomial.
model = SVC(kernel='poly', degree=8)
Multi-class Classification using Gaussian Kernel
Same as Polynomial Kernel, just replace the Kernel type from poly to rbf in Step 5. rbf stands for Radial Basis Function.
model = SVC(kernel='rbf')
1. Linear Kernel
2. Polynomial Kernel
3. Gaussian Kernel
4. Sigmoid Kernel
Linear and Sigmoid Kernels are used to solve binary classification problems while Polynomial and Gaussian Kernels are used to solve multi-class classification problems.
In this article, we will first try to solve a binary classification problem (using bank note authentication dataset) using Linear and Sigmoid Kernel and then will solve multi-class classification problem (using IRIS dataset) using Gaussian and Polynomial Kernel.
You can download bank_note_authentication.csv from here and iris.csv from here. You can also download my following Jupyter notebooks containing below code of SVM implementation:
Jupyter notebook for SVM Linear Kernel
Jupyter notebook for SVM Polynomial Kernel
Binary Classification using Linear Kernel
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.drop('Class', axis=1)
y = dataset['Class']
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)
Step 5: Create and fit the model
model = SVC(kernel='linear')
model.fit(X_train, y_train)
Step 6: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 7: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
Binary Classification using Sigmoid Kernel
All the above steps are same except Step 5. Just change the Kernel type from linear to sigmoid.
model = SVC(kernel='sigmoid')
Multi-class Classification using Polynomial Kernel
All the above steps are same except Step 2 and 5.
In Step 2, we will load IRIS dataset (which is multi-class dataset) instead of bank note authentication dataset (which is binary classification dataset).
names = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width', 'Class']
dataset = pd.read_csv('iris.csv', names=names)
In Step 5, just change the Kernel type from linear to poly. Also mention the degree of polynomial.
model = SVC(kernel='poly', degree=8)
Multi-class Classification using Gaussian Kernel
Same as Polynomial Kernel, just replace the Kernel type from poly to rbf in Step 5. rbf stands for Radial Basis Function.
model = SVC(kernel='rbf')
No comments:
Post a Comment