Pages

Saturday, 16 February 2019

Basic steps to implement a Machine Learning Algorithm in Python

I will illustrate some basic and common steps which you have to take while implementing any Machine Learning algorithm in Python. In this post, I will take a simple example of XGBoost algorithm.

To implement XGBoost algorithm in Python, you need to first import the required libraries, load the dataset, mention X and Y coordinates, split your dataset into training and test, fit and predict the data from the algorithm and then finally check the accuracy.

Assumptions:

1. You have basic knowledge of Python, Jupyter Notebook and Machine Learning Libraries in Python.

2. Dataset is in proper format. So, I don't have to do any data wrangling and implement any dimensionality reduction technique.

3. You know basics of XGBoost algorithm.

Steps:

1. Import the required Python libraries like pandas, numpy, sklearn etc.
2. Load the dataset
3. Mention X and Y axis
4. Split the dataset into training and testing dataset
5. Create and fit a model
6. Predict from the model
7. Check accuracy

You can download pima-indians-diabetes.csv from here. You can also download my Jupyter notebook containing below code of XGBoost implementation.

1. Import the required Python libraries like pandas, numpy, sklearn etc.

import pandas as pd
from numpy import loadtxt
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

2. Load the dataset (I will load PIMA Indians Diabetes dataset)

dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")

3. Mention X and Y axis

X = dataset[:,0:8]
Y = dataset[:,8]

4. Split the dataset into training and testing dataset

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=0)

5. Create and fit a model

model = XGBClassifier()
model.fit(X_train, y_train)
print(model)

6. Predict from the model

y_pred = model.predict(X_test)

The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.

Lets see the difference between the actual and predicted values.

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df  

7. Check accuracy

confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)

No comments:

Post a Comment