烟雾法和ADASYN对于不平衡的多级数据处理

Yulian Pamuji, Sephia Dwi Arma Putri
{"title":"烟雾法和ADASYN对于不平衡的多级数据处理","authors":"Yulian Pamuji, Sephia Dwi Arma Putri","doi":"10.33795/jip.v9i3.1330","DOIUrl":null,"url":null,"abstract":"Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.","PeriodicalId":232501,"journal":{"name":"Jurnal Informatika Polinema","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS\",\"authors\":\"Yulian Pamuji, Sephia Dwi Arma Putri\",\"doi\":\"10.33795/jip.v9i3.1330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.\",\"PeriodicalId\":232501,\"journal\":{\"name\":\"Jurnal Informatika Polinema\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Informatika Polinema\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33795/jip.v9i3.1330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Informatika Polinema","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33795/jip.v9i3.1330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

数据挖掘是将数据库系统、统计学、机器学习和可视化等多种科学分支结合在一起,分析大型数据集以获得有用数据特征的一种活动。为了解决不平衡数据集的问题,通过使用SMOTE和ADASYN方法的比较来平衡非均匀类在类之间的分布,以确保数量在多数(负)和少数(正)类之间平衡。根据本研究的实验结果,使用分类方法测试SMOTE方法可以通过产生MCC和Gmean值来处理不平衡数据中的多数(负)类和少数(正)类的数量,从而获得比单独使用分类方法或使用ADASYN方法更好的预测性能。此外,对于MultiClass数据集,SMOTE + KNN的MCC和Gmean值最高,MCC值为0.64,Gmean值为0.74。这说明数据预处理阶段对类分布不平衡的处理过程会影响SMOTE + KNN方法中MCC和Gmean的精度值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
KOMPARASI METODE SMOTE DAN ADASYN UNTUK PENANGANAN DATA TIDAK SEIMBANG MULTICLASS
Data Mining is an activity that combines various branches of science into one, consisting of database systems, statistics, machine learning, and visualization, to analyze a large dataset in order to obtain useful data characteristics. To address the problem of imbalanced datasets, the distribution of non-uniform classes among classes is balanced by using a comparison of the SMOTE and ADASYN methods to ensure that the number is balanced between majority (negative) and minority (positive) classes. Based on the results of experiments conducted in this study, testing the SMOTE method with a classification method can handle the number of majority (negative) and minority (positive) classes in imbalanced data by producing MCC and Gmean values that achieve better predictive performance than using a classification method alone or using the ADASYN method. Furthermore, for the MultiClass dataset, the highest MCC and Gmean values were achieved using SMOTE + KNN with the highest MCC value of 0.64 and Gmean value of 0.74. This indicates that the handling process of imbalanced class distribution in the data preprocessing stage has an influence on the accuracy values of MCC and Gmean in the SMOTE + KNN method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analisis Keamanan Sistem Informasi Perguruan Tinggi Berbasis Indeks KAMI Evaluasi Kinerja Enkripsi Algoritma LEA Mode CTR pada NodeMCU8266 Komparasi Metode Mean dan KNN Imputation dalam Mengatasi Missing Value pada Dataset Kecil Rancang Bangun Sistem Informasi Perpustakaan Berbasis Website di SMAN Ploso Menggunakan Algoritma Apriori Implementasi Algoritma CNN dalam Sistem Absensi Berbasis Pengenalan Wajah
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1