{"title":"Application of Natural Neighbor-based Algorithm on Oversampling SMOTE Algorithms","authors":"C. Srinilta, Sivakorn Kanharattanachai","doi":"10.1109/ICEAST52143.2021.9426310","DOIUrl":null,"url":null,"abstract":"Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the number of nearest neighbors with respect to a certain data point. K that fits the dataset the most gives the optimum performance. This paper proposes an approach to suggest a value of the parameter k using Natural Neighbor algorithm. Datasets are made balanced by four SMOTE-based algorithms–standard SMOTE, Safe-Level-SMOTE, ModifiedSMOTE and Weighted-SMOTE. The F-measure and Recall matrices are used to evaluate classification performance of a Support Vector Machine classifier running against six datasets with different imbalance ratios. The results show that, the average classification performance achieved by the proposed k’s is closer to the optimum when compared with the performance given by the default value of k.","PeriodicalId":416531,"journal":{"name":"2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Engineering, Applied Sciences and Technology (ICEAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEAST52143.2021.9426310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Classification performance depends highly on data distribution. In real life, data often come imbalanced where one class is found more often than others. SMOTE-based algorithms are usually used to handle the class imbalance problem. One key parameter that algorithms in SMOTE family require is k-the number of nearest neighbors with respect to a certain data point. K that fits the dataset the most gives the optimum performance. This paper proposes an approach to suggest a value of the parameter k using Natural Neighbor algorithm. Datasets are made balanced by four SMOTE-based algorithms–standard SMOTE, Safe-Level-SMOTE, ModifiedSMOTE and Weighted-SMOTE. The F-measure and Recall matrices are used to evaluate classification performance of a Support Vector Machine classifier running against six datasets with different imbalance ratios. The results show that, the average classification performance achieved by the proposed k’s is closer to the optimum when compared with the performance given by the default value of k.