{"title":"CW-kNN: an efficient kNN-based model for imbalanced dataset classification","authors":"Yi Xiang, Zhong Cao, Shaowen Yao, Jing He","doi":"10.1145/3290420.3290431","DOIUrl":null,"url":null,"abstract":"K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.","PeriodicalId":259201,"journal":{"name":"International Conference on Critical Infrastructure Protection","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Critical Infrastructure Protection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3290420.3290431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
K nearest neighbor (kNN) method is a popular classification method in data mining because of its simple implementation and significant classification performance. However, kNN do not scale well to big datasets. In this paper, CLUKER, a novel kNN regression method based on hierarchical clustering, is proposed. CLUKER uses hierarchical clustering to divide the original dataset into several parts, effectively reducing the query scope of kNN. Moreover, in order to improve kNN's ability to handle imbalanced datasets, this paper proposes a novel weighting method based on local data distribution, called LD-Weighting method. In the end, having integrated the two algorithms, this paper proposes an efficient kNN-based model for imbalanced dataset classification called CW-kNN. The experimental results show that the proposed methods perform well on different datasets.