In this post, I will implement Naive Bayes algorithm in Python using Scikit Learn library. I will use wine dataset which is already available in the scikit-learn library. This is a very famous dataset for multi-class classification problem.
This dataset contains 13 features (alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids, nonflavanoid_phenols, proanthocyanins, color_intensity, hue, od280/od315_of_diluted_wines, proline) and type of wine cultivar.
This dataset has three types of wine Class_0, Class_1, and Class_3.
You can also download my Jupyter notebook containing below code of Naive Bayes implementation.
Step 1: Import the required Python libraries
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
dataset = datasets.load_wine()
dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]
Step 3: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2, random_state=0)
Step 4: Create and fit the model
model = GaussianNB()
model.fit(X_train, y_train)
Step 5: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
df
Step 6: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
This dataset contains 13 features (alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids, nonflavanoid_phenols, proanthocyanins, color_intensity, hue, od280/od315_of_diluted_wines, proline) and type of wine cultivar.
This dataset has three types of wine Class_0, Class_1, and Class_3.
You can also download my Jupyter notebook containing below code of Naive Bayes implementation.
Step 1: Import the required Python libraries
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
dataset = datasets.load_wine()
dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]
Step 3: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2, random_state=0)
Step 4: Create and fit the model
model = GaussianNB()
model.fit(X_train, y_train)
Step 5: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
df
Step 6: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
No comments:
Post a Comment