When we use train_test_split, we train and test our model only on a particular set of our dataset. This may lead to overfitting. To avoid this, we should train and test our model on different sets of the dataset. Here cross validation comes into the picture. Cross validation avoids overfitting of the model.
In my earlier article, I had created a KNN model using train_test_split. Now, I will create the same KNN model using cross validation (cross_val_score). Instead of loading IRIS dataset from the URL, I will use the one which is already present in sklearn library.
You can also download my Jupyter notebook containing below code of KNN cross validation implementation.
Step 1: Import the required Python libraries
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
Step 2: Load and examine the dataset
dataset = datasets.load_iris()
dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]
Step 3: Create a KNN model
model = KNeighborsClassifier(n_neighbors=11)
Step 4: Apply 10 Fold Cross Validation and check accuracy
scores = cross_val_score(model, dataset.data, dataset.target, cv=10, scoring="accuracy")
print(scores)
meanScore = scores.mean()
print(meanScore * 100)
As this is a 10 fold cross validation, 10 scores will get displayed:
[1. 0.93333333 1. 1. 1. 0.86666667
0.93333333 0.93333333 1. 1. ]
To consolidate all the scores, take mean of all the scores and we get final accuracy:
96.66666666666668
In my earlier article, I had created a KNN model using train_test_split. Now, I will create the same KNN model using cross validation (cross_val_score). Instead of loading IRIS dataset from the URL, I will use the one which is already present in sklearn library.
You can also download my Jupyter notebook containing below code of KNN cross validation implementation.
Step 1: Import the required Python libraries
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
Step 2: Load and examine the dataset
dataset = datasets.load_iris()
dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]
Step 3: Create a KNN model
model = KNeighborsClassifier(n_neighbors=11)
Step 4: Apply 10 Fold Cross Validation and check accuracy
scores = cross_val_score(model, dataset.data, dataset.target, cv=10, scoring="accuracy")
print(scores)
meanScore = scores.mean()
print(meanScore * 100)
As this is a 10 fold cross validation, 10 scores will get displayed:
[1. 0.93333333 1. 1. 1. 0.86666667
0.93333333 0.93333333 1. 1. ]
To consolidate all the scores, take mean of all the scores and we get final accuracy:
96.66666666666668
No comments:
Post a Comment