烟雾法和ADASYN对于不平衡的多级数据处理

Jurnal Informatika Polinema Pub Date : 2023-05-17 DOI:10.33795/jip.v9i3.1330

Yulian Pamuji, Sephia Dwi Arma Putri

{"title":"烟雾法和ADASYN对于不平衡的多级数据处理","authors":"Yulian Pamuji, Sephia Dwi Arma Putri","doi":"10.33795/jip.v9i3.1330","DOIUrl":null,"url":null,"abstract":"Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.","PeriodicalId":232501,"journal":{"name":"Jurnal Informatika Polinema","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS\",\"authors\":\"Yulian Pamuji, Sephia Dwi Arma Putri\",\"doi\":\"10.33795/jip.v9i3.1330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.\",\"PeriodicalId\":232501,\"journal\":{\"name\":\"Jurnal Informatika Polinema\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Informatika Polinema\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33795/jip.v9i3.1330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Informatika Polinema","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33795/jip.v9i3.1330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据挖掘是将数据库系统、统计学、机器学习和可视化等多种科学分支结合在一起，分析大型数据集以获得有用数据特征的一种活动。为了解决不平衡数据集的问题，通过使用SMOTE和ADASYN方法的比较来平衡非均匀类在类之间的分布，以确保数量在多数(负)和少数(正)类之间平衡。根据本研究的实验结果，使用分类方法测试SMOTE方法可以通过产生MCC和Gmean值来处理不平衡数据中的多数(负)类和少数(正)类的数量，从而获得比单独使用分类方法或使用ADASYN方法更好的预测性能。此外，对于MultiClass数据集，SMOTE + KNN的MCC和Gmean值最高，MCC值为0.64,Gmean值为0.74。这说明数据预处理阶段对类分布不平衡的处理过程会影响SMOTE + KNN方法中MCC和Gmean的精度值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS

Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Jurnal Informatika Polinema

自引率

0.00%

发文量