Pages

Tuesday, 26 February 2019

How to choose optimal value of K in KNN Algorithm?

There is no straightforward method to calculate the value of K in KNN. You have to play around with different values to choose the optimal value of K. Choosing a right value of K is a process called Hyperparameter Tuning.

The value of optimum K totally depends on the dataset that you are using. The best value of K for KNN is highly data-dependent. In different scenarios, the optimum K may vary. It is more or less hit and trail method.

You need to maintain a balance while choosing the value of K in KNN. K should not be too small or too large. 

A small value of K means that noise will have a higher influence on the result. 

Larger the value of K, higher is the accuracy. If K is too large, you are under-fitting your model. In this case, the error will go up again. So, at the same time you also need to prevent your model from under-fitting. Your model should retain generalization capabilities otherwise there are fair chances that your model may perform well in the training data but drastically fail in the real data. Larger K will also increase the computational expense of the algorithm.

There is no one proper method of estimation of K value in KNN. No method is the rule of thumb but you should try considering following suggestions:

1. Square Root Method: Take square root of the number of samples in the training dataset.

2. Cross Validation Method: We should also use cross validation to find out the optimal value of K in KNN. Start with K=1, run cross validation (5 to 10 fold),  measure the accuracy and keep repeating till the results become consistent. 

K=1, 2, 3... As K increases, the error usually goes down, then stabilizes, and then raises again. Pick the optimum K at the beginning of the stable zone. This is also called Elbow Method.

3. Domain Knowledge also plays a vital role while choosing the optimum value of K.

4. K should be an odd number.

I would suggest to try a mix of all the above points to reach any conclusion. 

No comments:

Post a Comment

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.