Random Forest is a bagging algorithm based on Ensemble Learning technique. The random forest algorithm can be used for both classification and regression problems.
In this article, we will solve a classification problem (bank note authentication) using Random Forest. We need to import RandomForestClassifier from sklearn library to implement Random Forest.
You can download bank_note_authentication.csv from here. You can also download my Jupyter notebook containing below code of Random Forest implementation.
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
Step 5: Scale the features
standardScaler = StandardScaler()
X_train = standardScaler.fit_transform(X_train)
X_test = standardScaler.transform(X_test)
This step is not must for Random Forest as it is being taken care by Random Forest internally. Feature scaling is not required in tree based algorithms.
Step 6: Create and fit the model
model = RandomForestClassifier(n_estimators=20, random_state=0)
model.fit(X_train, y_train)
"n_estimators" is the number of trees we want to create in a Random Forest. By default, it is 100.
Step 7: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 8: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
In this article, we will solve a classification problem (bank note authentication) using Random Forest. We need to import RandomForestClassifier from sklearn library to implement Random Forest.
You can download bank_note_authentication.csv from here. You can also download my Jupyter notebook containing below code of Random Forest implementation.
Step 1: Import the required Python libraries like pandas and sklearn
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
Step 2: Load and examine the dataset
names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
dataset = pd.read_csv('bank_note_authentication.csv', names=names)
dataset.shape
dataset.head()
Step 3: Mention X and Y axis
X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values
Step 4: Split the dataset into training and testing dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
Step 5: Scale the features
standardScaler = StandardScaler()
X_train = standardScaler.fit_transform(X_train)
X_test = standardScaler.transform(X_test)
This step is not must for Random Forest as it is being taken care by Random Forest internally. Feature scaling is not required in tree based algorithms.
Step 6: Create and fit the model
model = RandomForestClassifier(n_estimators=20, random_state=0)
model.fit(X_train, y_train)
"n_estimators" is the number of trees we want to create in a Random Forest. By default, it is 100.
Step 7: Predict from the model
y_pred = model.predict(X_test)
The y_pred is a numpy array that contains all the predicted values for the input values in the X_test.
Lets see the difference between the actual and predicted values.
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df
Step 8: Check the accuracy
confusionMatrix = confusion_matrix(y_test, y_pred)
accuracyScore = accuracy_score(y_test, y_pred)
classificationReport = classification_report(y_test, y_pred)
print(confusionMatrix)
print(accuracyScore * 100)
print(classificationReport)
No comments:
Post a Comment