{"title":"预平均伪近邻分类器","authors":"Dapeng Li","doi":"10.7717/peerj-cs.2247","DOIUrl":null,"url":null,"abstract":"The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"49 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A pre-averaged pseudo nearest neighbor classifier\",\"authors\":\"Dapeng Li\",\"doi\":\"10.7717/peerj-cs.2247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2247\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2247","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
K 近邻算法是一种强大的分类方法。然而,在存在异常值的小样本中,其分类性能会受到影响。为了解决这个问题,我们提出了一种预平均伪近邻分类器(PAPNN)来提高分类性能。在 PAPNN 规则中,预平均分类向量是通过取每个类别中训练集中任意两个点的平均值来计算的。然后,从每一类的预处理向量中选择 k 个伪近邻来确定查询点的类别。预平均向量可以在一定程度上减少异常值的负面影响。通过将 PAPNN 与其他 12 种分类方法进行比较,在 19 个数值真实数据集和 3 个高维真实数据集上进行了大量实验。实验结果表明,所提出的 PAPNN 规则对于存在离群值的小尺寸样本的分类任务是有效的。
The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.