{"title":"A pre-averaged pseudo nearest neighbor classifier","authors":"Dapeng Li","doi":"10.7717/peerj-cs.2247","DOIUrl":null,"url":null,"abstract":"The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"49 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2247","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.