{"title":"An analysis of the coupling between training set and neighborhood sizes for the kNN classifier","authors":"J. S. Olsson","doi":"10.1145/1148170.1148317","DOIUrl":null,"url":null,"abstract":"We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1148170.1148317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We consider the relationship between training set size and the parameter k for the k-Nearest Neighbors (kNN) classifier. When few examples are available, we observe that accuracy is sensitive to k and that best k tends to increase with training size. We explore the subsequent risk that k tuned on partitions will be suboptimal after aggregation and re-training. This risk is found to be most severe when little data is available. For larger training sizes, accuracy becomes increasingly stable with respect to k and the risk decreases.