There is no straightforward method to calculate the value of K in KNN. You have to play around with different values to choose the optimal value of K. Choosing a right value of K is a process called Hyperparameter Tuning.
The value of optimum K totally depends on the dataset that you are using. The best value of K for KNN is highly data-dependent. In different scenarios, the optimum K may vary. It is more or less hit and trail method.
You need to maintain a balance while choosing the value of K in KNN. K should not be too small or too large.
A small value of K means that noise will have a higher influence on the result.
Larger the value of K, higher is the accuracy. If K is too large, you are under-fitting your model. In this case, the error will go up again. So, at the same time you also need to prevent your model from under-fitting. Your model should retain generalization capabilities otherwise there are fair chances that your model may perform well in the training data but drastically fail in the real data. Larger K will also increase the computational expense of the algorithm.
There is no one proper method of estimation of K value in KNN. No method is the rule of thumb but you should try considering following suggestions:
1. Square Root Method: Take square root of the number of samples in the training dataset.
2. Cross Validation Method: We should also use cross validation to find out the optimal value of K in KNN. Start with K=1, run cross validation (5 to 10 fold), measure the accuracy and keep repeating till the results become consistent.
K=1, 2, 3... As K increases, the error usually goes down, then stabilizes, and then raises again. Pick the optimum K at the beginning of the stable zone. This is also called Elbow Method.
3. Domain Knowledge also plays a vital role while choosing the optimum value of K.
4. K should be an odd number.
I would suggest to try a mix of all the above points to reach any conclusion.
The value of optimum K totally depends on the dataset that you are using. The best value of K for KNN is highly data-dependent. In different scenarios, the optimum K may vary. It is more or less hit and trail method.
You need to maintain a balance while choosing the value of K in KNN. K should not be too small or too large.
A small value of K means that noise will have a higher influence on the result.
Larger the value of K, higher is the accuracy. If K is too large, you are under-fitting your model. In this case, the error will go up again. So, at the same time you also need to prevent your model from under-fitting. Your model should retain generalization capabilities otherwise there are fair chances that your model may perform well in the training data but drastically fail in the real data. Larger K will also increase the computational expense of the algorithm.
There is no one proper method of estimation of K value in KNN. No method is the rule of thumb but you should try considering following suggestions:
1. Square Root Method: Take square root of the number of samples in the training dataset.
2. Cross Validation Method: We should also use cross validation to find out the optimal value of K in KNN. Start with K=1, run cross validation (5 to 10 fold), measure the accuracy and keep repeating till the results become consistent.
K=1, 2, 3... As K increases, the error usually goes down, then stabilizes, and then raises again. Pick the optimum K at the beginning of the stable zone. This is also called Elbow Method.
3. Domain Knowledge also plays a vital role while choosing the optimum value of K.
4. K should be an odd number.
I would suggest to try a mix of all the above points to reach any conclusion.
No comments:
Post a Comment