集成模型作为数据集不平衡类分类解决方案的比较

2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM) Pub Date : 2023-01-03 DOI:10.1109/IMCOM56909.2023.10035615

Yoga Pristyanto, A. F. Nugraha, Rifda Faticha Alfa Aziza, Ibnu Hadi Purwanto, Mulia Sulistiyono, Akhmad Dahlan

{"title":"集成模型作为数据集不平衡类分类解决方案的比较","authors":"Yoga Pristyanto, A. F. Nugraha, Rifda Faticha Alfa Aziza, Ibnu Hadi Purwanto, Mulia Sulistiyono, Akhmad Dahlan","doi":"10.1109/IMCOM56909.2023.10035615","DOIUrl":null,"url":null,"abstract":"A phenomenon known as “class imbalance” occurs when an excessive number of classes are evaluated in relation to other classes. This circumstance is quite common in the challenges that classification modeling is used to in the actual world. Because of the influence of class imbalance on the dataset, the classification model's performance is not at its highest possible level. In addition, the presence of these factors might make the possibility of incorrect categorization greater. Utilizing an ensemble model is one approach that may be used to resolve this issue. The originality of the dataset is preserved, which is one of the many benefits of this method. In this work, three different types of ensemble models-XGBoost, Stacking, and Bagging-were examined and contrasted. All three were put through their paces using five distinct unbalanced multiclass datasets, each with a different value for the imbalanced ratio. The results of the three experiments that used five different assessment indicators reveal that the XGBoost model performs much better than the Bagging and Stacking models when it comes to overall performance. The XGBoost model performs exceptionally well in all of the indicators that were evaluated, including Balanced Accuracy, True Positive Rate, True Negative Rate, Geometric Mean, and Multiclass Area Under Curve. These findings provide more evidence that XGBoost is a viable option for addressing multiclass unbalanced issues in datasets.","PeriodicalId":230213,"journal":{"name":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparison of Ensemble Models as Solutions for Imbalanced Class Classification of Datasets\",\"authors\":\"Yoga Pristyanto, A. F. Nugraha, Rifda Faticha Alfa Aziza, Ibnu Hadi Purwanto, Mulia Sulistiyono, Akhmad Dahlan\",\"doi\":\"10.1109/IMCOM56909.2023.10035615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A phenomenon known as “class imbalance” occurs when an excessive number of classes are evaluated in relation to other classes. This circumstance is quite common in the challenges that classification modeling is used to in the actual world. Because of the influence of class imbalance on the dataset, the classification model's performance is not at its highest possible level. In addition, the presence of these factors might make the possibility of incorrect categorization greater. Utilizing an ensemble model is one approach that may be used to resolve this issue. The originality of the dataset is preserved, which is one of the many benefits of this method. In this work, three different types of ensemble models-XGBoost, Stacking, and Bagging-were examined and contrasted. All three were put through their paces using five distinct unbalanced multiclass datasets, each with a different value for the imbalanced ratio. The results of the three experiments that used five different assessment indicators reveal that the XGBoost model performs much better than the Bagging and Stacking models when it comes to overall performance. The XGBoost model performs exceptionally well in all of the indicators that were evaluated, including Balanced Accuracy, True Positive Rate, True Negative Rate, Geometric Mean, and Multiclass Area Under Curve. These findings provide more evidence that XGBoost is a viable option for addressing multiclass unbalanced issues in datasets.\",\"PeriodicalId\":230213,\"journal\":{\"name\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"204 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM56909.2023.10035615\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM56909.2023.10035615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

当相对于其他类评估过多的类时，就会出现称为“类失衡”的现象。这种情况在分类建模在现实世界中所面临的挑战中很常见。由于类不平衡对数据集的影响，分类模型的性能不能达到最高水平。此外，这些因素的存在可能会增加错误分类的可能性。利用集成模型是一种可用于解决此问题的方法。保留了数据集的原创性，这是该方法的众多优点之一。在这项工作中，研究并对比了三种不同类型的集成模型——xgboost、Stacking和bagging。所有这三个都使用五个不同的不平衡多类数据集进行测试，每个数据集都有不同的不平衡比率值。使用五种不同评估指标的三个实验结果表明，在整体性能方面，XGBoost模型比Bagging和Stacking模型要好得多。XGBoost模型在评估的所有指标中都表现出色，包括平衡精度、真阳性率、真阴性率、几何平均值和多类别曲线下面积。这些发现提供了更多的证据，证明XGBoost是解决数据集中多类不平衡问题的可行选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of Ensemble Models as Solutions for Imbalanced Class Classification of Datasets

A phenomenon known as “class imbalance” occurs when an excessive number of classes are evaluated in relation to other classes. This circumstance is quite common in the challenges that classification modeling is used to in the actual world. Because of the influence of class imbalance on the dataset, the classification model's performance is not at its highest possible level. In addition, the presence of these factors might make the possibility of incorrect categorization greater. Utilizing an ensemble model is one approach that may be used to resolve this issue. The originality of the dataset is preserved, which is one of the many benefits of this method. In this work, three different types of ensemble models-XGBoost, Stacking, and Bagging-were examined and contrasted. All three were put through their paces using five distinct unbalanced multiclass datasets, each with a different value for the imbalanced ratio. The results of the three experiments that used five different assessment indicators reveal that the XGBoost model performs much better than the Bagging and Stacking models when it comes to overall performance. The XGBoost model performs exceptionally well in all of the indicators that were evaluated, including Balanced Accuracy, True Positive Rate, True Negative Rate, Geometric Mean, and Multiclass Area Under Curve. These findings provide more evidence that XGBoost is a viable option for addressing multiclass unbalanced issues in datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)

自引率

0.00%

发文量