XGBoost is an implementation of Gradient Boosting Machine. XGBoost is an optimized and regularized version of GBM. In this post, we will try to build a model using

import pandas as pd

import numpy as np

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error, mean_squared_error

dataset = load_boston()

dataset.keys()

dataset.data

dataset.target

dataset.data[0:5]

dataset.target[0:5]

dataset.data.shape

dataset.target.shape

dataset.feature_names

print(dataset.DESCR)

data.columns = dataset.feature_names

data.head()

data['PRICE'] = dataset.target

data.head()

data.info()

data.describe()

X, y = data.iloc[:,:-1], data.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})

df

meanAbsoluteError =

meanSquaredError =

rootMeanSquaredError = np.

print('Mean Absolute Error:', meanAbsoluteError)

print('Mean Squared Error:', meanSquaredError)

print('Root Mean Squared Error:', rootMeanSquaredError)

**XGBRegressor**to**predict the prices using Boston dataset. To know more about XGBoost and GBM, please consider visiting this post.****You can download my Jupyter notebook implementing XGBoost from here.**

**Step 1: Import the required Python libraries like pandas, numpy and sklearn**import pandas as pd

import numpy as np

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split

**from xgboost import XGBRegressor**from sklearn.metrics import mean_absolute_error, mean_squared_error

**Step 2: Load and examine the dataset (Data Exploration)**dataset = load_boston()

dataset.keys()

dataset.data

dataset.target

dataset.data[0:5]

dataset.target[0:5]

dataset.data.shape

dataset.target.shape

dataset.feature_names

print(dataset.DESCR)

*#convert the loaded dataset from scikit learn library to pandas library**data = pd.DataFrame(dataset.data)*

data.columns = dataset.feature_names

data.head()

data['PRICE'] = dataset.target

data.head()

data.info()

data.describe()

*Please note that "describe()" is used to display the statistical values of the data like mean and standard deviation.***Step 3: Mention X and Y axis**X, y = data.iloc[:,:-1], data.iloc[:,-1]

*X contains the list of attributes**Y contains the list of labels***Step 4: Split the dataset into training and testing dataset**X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=0)

**Step 5: Create and fit the model****model = XGBRegressor(objective='reg:linear', colsample_bytree=0.3, learning_rate=0.1,****max_depth=5, alpha=10, n_estimators=10)**model.fit(X_train, y_train)

*I will write a separate post explaining above hyperparameters of XGBoost algorithm.***Step 6: Predict from the model**y_pred = model.predict(X_test)

*The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.**Lets see the difference between the actual and predicted values.*df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})

df

**Step 7: Check the accuracy**meanAbsoluteError =

**mean_absolute_error**(y_test, y_pred)meanSquaredError =

**mean_squared_error**(y_test, y_pred)rootMeanSquaredError = np.

**sqrt**(meanSquaredError)print('Mean Absolute Error:', meanAbsoluteError)

print('Mean Squared Error:', meanSquaredError)

print('Root Mean Squared Error:', rootMeanSquaredError)

## No comments:

## Post a Comment