Lets implement Decision Tree algorithm in Python using Scikit Learn library. Decision Trees can be used to solve both classification and regression problems.
In this article, we will solve a classification problem (bank note authentication) using Decision Tree. We need to import DecisionTreeClassifier from sklearn library to implement Decision Tree.
You can download bank_note_authentication.csv from here. You can also download my Jupyter notebook containing below code of Decision Tree implementation.
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.drop('Class', axis=1)
y = dataset['Class']
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)
Step 5: Create and fit the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
Step 6: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 7: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
In this article, we will solve a classification problem (bank note authentication) using Decision Tree. We need to import DecisionTreeClassifier from sklearn library to implement Decision Tree.
You can download bank_note_authentication.csv from here. You can also download my Jupyter notebook containing below code of Decision Tree implementation.
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.drop('Class', axis=1)
y = dataset['Class']
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)
Step 5: Create and fit the model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
Step 6: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 7: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
No comments:
Post a Comment