Yoga Pristyanto, A. F. Nugraha, Irfan Pratama, Akhmad Dahlan, Lucky Adhikrisna Wirasakti
{"title":"使用过采样和集成学习技术处理数据集中不平衡类的双重方法","authors":"Yoga Pristyanto, A. F. Nugraha, Irfan Pratama, Akhmad Dahlan, Lucky Adhikrisna Wirasakti","doi":"10.1109/IMCOM51814.2021.9377420","DOIUrl":null,"url":null,"abstract":"In the field of machine learning, the existence of class imbalances in the dataset will make the resulting model have less than optimal performance. Theoretically, the single classifier has a weakness for class imbalance conditions in the datasets because of the majority of single classifiers tend to work by recognizing patterns in the majority class the datasets are not balanced. So, the performance cannot be maximized. In this study, two approaches were introduced to deal with class imbalance conditions in the dataset. The first approach uses ADASYN as resampling while the second approach uses the Stacking algorithm as meta-learning. After conducting a test using 5 datasets with different imbalanced ratios, it shows that the proposed method produced the highest g-mean and AUC score compared to the other classification algorithms. The proposed method in this study is the stacking algorithm between the SVM and Random Forest algorithms and the addition of ADASYN in the resampling process. Hence, the proposed method can be a solution for handling class imbalance in datasets. However, this study has limitations such as the dataset used is a dataset with a binary class category. For this reason, for the future work, testing will be suggested using the imbalanced class dataset with the multiclass datasets.","PeriodicalId":275121,"journal":{"name":"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques\",\"authors\":\"Yoga Pristyanto, A. F. Nugraha, Irfan Pratama, Akhmad Dahlan, Lucky Adhikrisna Wirasakti\",\"doi\":\"10.1109/IMCOM51814.2021.9377420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of machine learning, the existence of class imbalances in the dataset will make the resulting model have less than optimal performance. Theoretically, the single classifier has a weakness for class imbalance conditions in the datasets because of the majority of single classifiers tend to work by recognizing patterns in the majority class the datasets are not balanced. So, the performance cannot be maximized. In this study, two approaches were introduced to deal with class imbalance conditions in the dataset. The first approach uses ADASYN as resampling while the second approach uses the Stacking algorithm as meta-learning. After conducting a test using 5 datasets with different imbalanced ratios, it shows that the proposed method produced the highest g-mean and AUC score compared to the other classification algorithms. The proposed method in this study is the stacking algorithm between the SVM and Random Forest algorithms and the addition of ADASYN in the resampling process. Hence, the proposed method can be a solution for handling class imbalance in datasets. However, this study has limitations such as the dataset used is a dataset with a binary class category. For this reason, for the future work, testing will be suggested using the imbalanced class dataset with the multiclass datasets.\",\"PeriodicalId\":275121,\"journal\":{\"name\":\"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM51814.2021.9377420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM51814.2021.9377420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques
In the field of machine learning, the existence of class imbalances in the dataset will make the resulting model have less than optimal performance. Theoretically, the single classifier has a weakness for class imbalance conditions in the datasets because of the majority of single classifiers tend to work by recognizing patterns in the majority class the datasets are not balanced. So, the performance cannot be maximized. In this study, two approaches were introduced to deal with class imbalance conditions in the dataset. The first approach uses ADASYN as resampling while the second approach uses the Stacking algorithm as meta-learning. After conducting a test using 5 datasets with different imbalanced ratios, it shows that the proposed method produced the highest g-mean and AUC score compared to the other classification algorithms. The proposed method in this study is the stacking algorithm between the SVM and Random Forest algorithms and the addition of ADASYN in the resampling process. Hence, the proposed method can be a solution for handling class imbalance in datasets. However, this study has limitations such as the dataset used is a dataset with a binary class category. For this reason, for the future work, testing will be suggested using the imbalanced class dataset with the multiclass datasets.