An analysis of the coupling between training set and neighborhood sizes for the kNN classifier

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2006-08-06 DOI:10.1145/1148170.1148317

J. S. Olsson

引用次数: 9

Abstract

We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

kNN分类器训练集与邻域大小耦合分析

我们考虑k-最近邻(kNN)分类器的训练集大小和参数k之间的关系。当可用的示例很少时，我们观察到准确率对k很敏感，并且最佳k倾向于随着训练规模的增加而增加。我们探讨了在聚合和重新训练之后，在分区上调优的k将是次优的风险。当可用数据很少时，发现这种风险最为严重。对于较大的训练规模，准确率相对于k变得越来越稳定，风险降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊