I will illustrate some basic and common steps which you have to take while implementing any Machine Learning algorithm in Python. In this post, I will take a simple example of XGBoost algorithm.
To implement XGBoost algorithm in Python, you need to first import the required libraries, load the dataset, mention X and Y coordinates, split your dataset into training and test, fit and predict the data from the algorithm and then finally check the accuracy.
Assumptions:
1. You have basic knowledge of Python, Jupyter Notebook and Machine Learning Libraries in Python.
2. Dataset is in proper format. So, I don't have to do any data wrangling and implement any dimensionality reduction technique.
3. You know basics of XGBoost algorithm.
Steps:
1. Import the required Python libraries like pandas, numpy, sklearn etc.
2. Load the dataset
3. Mention X and Y axis
4. Split the dataset into training and testing dataset
5. Create and fit a model
6. Predict from the model
7. Check accuracy
You can download pima-indians-diabetes.csv from here. You can also download my Jupyter notebook containing below code of XGBoost implementation.
1. Import the required Python libraries like pandas, numpy, sklearn etc.
import pandas as pd
from numpy import loadtxt
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
2. Load the dataset (I will load PIMA Indians Diabetes dataset)
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
3. Mention X and Y axis
X = dataset[:,0:8]
Y = dataset[:,8]
4. Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=0)
5. Create and fit a model
model = XGBClassifier()
model.fit(X_train, y_train)
print(model)
6. Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
7. Check accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
To implement XGBoost algorithm in Python, you need to first import the required libraries, load the dataset, mention X and Y coordinates, split your dataset into training and test, fit and predict the data from the algorithm and then finally check the accuracy.
Assumptions:
1. You have basic knowledge of Python, Jupyter Notebook and Machine Learning Libraries in Python.
2. Dataset is in proper format. So, I don't have to do any data wrangling and implement any dimensionality reduction technique.
3. You know basics of XGBoost algorithm.
Steps:
1. Import the required Python libraries like pandas, numpy, sklearn etc.
2. Load the dataset
3. Mention X and Y axis
4. Split the dataset into training and testing dataset
5. Create and fit a model
6. Predict from the model
7. Check accuracy
You can download pima-indians-diabetes.csv from here. You can also download my Jupyter notebook containing below code of XGBoost implementation.
1. Import the required Python libraries like pandas, numpy, sklearn etc.
import pandas as pd
from numpy import loadtxt
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
2. Load the dataset (I will load PIMA Indians Diabetes dataset)
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
3. Mention X and Y axis
X = dataset[:,0:8]
Y = dataset[:,8]
4. Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.20, random_state=0)
5. Create and fit a model
model = XGBClassifier()
model.fit(X_train, y_train)
print(model)
6. Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
7. Check accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
No comments:
Post a Comment