## Wednesday 20 February 2019

### Implement Naive Bayes Algorithm in Python using Scikit Learn Library

In this post, I will implement Naive Bayes algorithm in Python using Scikit Learn library. I will use wine dataset which is already available in the scikit-learn library. This is a very famous dataset for multi-class classification problem.

This dataset contains 13 features (alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids, nonflavanoid_phenols, proanthocyanins, color_intensity, hue, od280/od315_of_diluted_wines, proline) and type of wine cultivar.

This dataset has three types of wine Class_0, Class_1, and Class_3.

You can also download my Jupyter notebook containing below code of Naive Bayes implementation.

Step 1: Import the required Python libraries

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

Step 2: Load and examine the dataset

dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]

Step 3: Split the dataset into training and testing dataset

X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2, random_state=0)

Step 4: Create and fit the model

model = GaussianNB()
model.fit(X_train, y_train)

Step 5: Predict from the model

y_pred = model.predict(X_test)

The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.

Lets see the difference between the actual and predicted values.

df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
df

Step 6: Check the accuracy

confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)