{"title":"一种改进的k-最近邻形心不完全数据分类方法","authors":"Yezhen Wang","doi":"10.1145/3558819.3565209","DOIUrl":null,"url":null,"abstract":"Missing values often exist in scientific datasets. Therefore, practical methods for missing data imputation and classification are necessary for machine learning, data analysis. The k-Nearest Neighbor (KNN) algorithm is a simple and effective algorithm in missing data imputation and classification. This paper focuses on the missing data classification problem and proposes a new classification method based on the local mean k-nearest centroid neighbour. When making classification judgments, the proposed method examines the closeness and symmetrical arrangement of the k neighbours and adopts the local mean-based vector of the k centroid neighbours for each class. We run classification error experiments on six UCI datasets to see how well the proposed method performs when there is missing data. Experimental results show that the performance of our proposed method obtains a significant improvement compared to the most advanced KNN-based algorithms.","PeriodicalId":373484,"journal":{"name":"Proceedings of the 7th International Conference on Cyber Security and Information Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Improved k-Nearest Centroid Neighbor Classification Method for Incomplete Data\",\"authors\":\"Yezhen Wang\",\"doi\":\"10.1145/3558819.3565209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missing values often exist in scientific datasets. Therefore, practical methods for missing data imputation and classification are necessary for machine learning, data analysis. The k-Nearest Neighbor (KNN) algorithm is a simple and effective algorithm in missing data imputation and classification. This paper focuses on the missing data classification problem and proposes a new classification method based on the local mean k-nearest centroid neighbour. When making classification judgments, the proposed method examines the closeness and symmetrical arrangement of the k neighbours and adopts the local mean-based vector of the k centroid neighbours for each class. We run classification error experiments on six UCI datasets to see how well the proposed method performs when there is missing data. Experimental results show that the performance of our proposed method obtains a significant improvement compared to the most advanced KNN-based algorithms.\",\"PeriodicalId\":373484,\"journal\":{\"name\":\"Proceedings of the 7th International Conference on Cyber Security and Information Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Conference on Cyber Security and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3558819.3565209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Cyber Security and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3558819.3565209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Improved k-Nearest Centroid Neighbor Classification Method for Incomplete Data
Missing values often exist in scientific datasets. Therefore, practical methods for missing data imputation and classification are necessary for machine learning, data analysis. The k-Nearest Neighbor (KNN) algorithm is a simple and effective algorithm in missing data imputation and classification. This paper focuses on the missing data classification problem and proposes a new classification method based on the local mean k-nearest centroid neighbour. When making classification judgments, the proposed method examines the closeness and symmetrical arrangement of the k neighbours and adopts the local mean-based vector of the k centroid neighbours for each class. We run classification error experiments on six UCI datasets to see how well the proposed method performs when there is missing data. Experimental results show that the performance of our proposed method obtains a significant improvement compared to the most advanced KNN-based algorithms.