## Friday 22 February 2019

### Implement Decision Tree Algorithm in Python using Scikit Learn Library for Classification Problem

Lets implement Decision Tree algorithm in Python using Scikit Learn library. Decision Trees can be used to solve both classification and regression problems.

In this article, we will solve a classification problem (bank note authentication) using Decision Tree. We need to import DecisionTreeClassifier from sklearn library to implement Decision Tree.

Step 1: Import the required Python libraries like pandas and sklearn

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

Step 2: Load and examine the dataset

names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset.shape

Step 3: Mention X and Y axis

X = dataset.drop('Class', axis=1)
y = dataset['Class']

Step 4: Split the dataset into training and testing dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)

Step 5: Create and fit the model

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Step 6: Predict from the model

y_pred = model.predict(X_test)

The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.

Lets see the difference between the actual and predicted values.

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df

Step 7: Check the accuracy

confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)