Pages

Tuesday, 19 February 2019

Implement KNN Algorithm using Cross Validation (cross_val_score) in Python

When we use train_test_split, we train and test our model only on a particular set of our dataset. This may lead to overfitting. To avoid this, we should train and test our model on different sets of the dataset. Here cross validation comes into the picture. Cross validation avoids overfitting of the model.

In my earlier article, I had created a KNN model using train_test_split. Now, I will create the same KNN model using cross validation (cross_val_score). Instead of loading IRIS dataset from the URL, I will use the one which is already present in sklearn library.

You can also download my Jupyter notebook containing below code of KNN cross validation implementation.

Step 1: Import the required Python libraries

from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

Step 2: Load and examine the dataset

dataset = datasets.load_iris()
dataset.feature_names
dataset.target_names
dataset.data.shape
dataset.target.shape
dataset.data[0:5]
dataset.target[0:5]

Step 3: Create a KNN model

model = KNeighborsClassifier(n_neighbors=11) 

Step 4: Apply 10 Fold Cross Validation and check accuracy

scores = cross_val_score(model, dataset.data, dataset.target, cv=10, scoring="accuracy")
print(scores)
meanScore = scores.mean()
print(meanScore * 100)

As this is a 10 fold cross validation, 10 scores will get displayed:
[1.         0.93333333 1.         1.         1.         0.86666667
 0.93333333 0.93333333 1.         1.        ]

To consolidate all the scores, take mean of all the scores and we get final accuracy:
96.66666666666668

No comments:

Post a Comment