Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum

Generosa Lukhayu Pritalia
{"title":"Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum","authors":"Generosa Lukhayu Pritalia","doi":"10.24002/konstelasi.v2i1.5630","DOIUrl":null,"url":null,"abstract":"  \nAbstract. Water is essential for survival. Currently, there are requirements to monitor, assess, and classify water quality to understand the impact of industrialization. The water quality classification process has been carried out using traditional methods such as WQI and Storet, and machine learning methods. Imbalanced data in machine learning method can make this method have a tendency to predict the majority class and become biased. In addition, using all features in the classification process can degrade classification performance and lead to high computation time. To overcome the above-mentioned problems, this study proposes several approaches, included resampling the data to be balanced, determined the most suitable and contributing features, and compared the performance of machine learning algorithms in classifying potable water. The results of handling unbalanced data and implementing feature selection were able to provide increased work on the algorithm, especially the accuracy metric reached 24.8% from previous study. The most optimal algorithm performance was obtained from Random Forest with 87% of precision, 84% of recall, 16% of Miss rate, 85% of F-measure, and 85% of test accuracy, while used seven best features. However, another important aspect is the smallest Miss rate, which was 15%, obtained from Decision Tree algorithm. \n ","PeriodicalId":163388,"journal":{"name":"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24002/konstelasi.v2i1.5630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

  Abstract. Water is essential for survival. Currently, there are requirements to monitor, assess, and classify water quality to understand the impact of industrialization. The water quality classification process has been carried out using traditional methods such as WQI and Storet, and machine learning methods. Imbalanced data in machine learning method can make this method have a tendency to predict the majority class and become biased. In addition, using all features in the classification process can degrade classification performance and lead to high computation time. To overcome the above-mentioned problems, this study proposes several approaches, included resampling the data to be balanced, determined the most suitable and contributing features, and compared the performance of machine learning algorithms in classifying potable water. The results of handling unbalanced data and implementing feature selection were able to provide increased work on the algorithm, especially the accuracy metric reached 24.8% from previous study. The most optimal algorithm performance was obtained from Random Forest with 87% of precision, 84% of recall, 16% of Miss rate, 85% of F-measure, and 85% of test accuracy, while used seven best features. However, another important aspect is the smallest Miss rate, which was 15%, obtained from Decision Tree algorithm.  
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对可饮用水质量分类的数据补偿分析
摘要水是生存所必需的。目前,有必要对水质进行监测、评估和分类,以了解工业化的影响。水质分类过程使用WQI和Storet等传统方法以及机器学习方法进行。机器学习方法中的数据不平衡会使该方法有预测多数类的倾向,从而产生偏差。此外,在分类过程中使用所有特征会降低分类性能并导致较高的计算时间。为了克服上述问题,本研究提出了几种方法,包括对待平衡数据进行重新采样,确定最合适和最有贡献的特征,并比较机器学习算法在饮用水分类中的性能。通过对不平衡数据的处理和特征选择的实现,提高了算法的工作效率,准确率达到了24.8%。随机森林在使用7个最佳特征的情况下,获得了87%的准确率、84%的召回率、16%的缺失率、85%的F-measure和85%的测试准确率的最优算法性能。然而,另一个重要方面是决策树算法的最小缺失率为15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analisis Postur Tubuh pada Pekerja dengan Metode Rapid Entire Body Assissment (REBA) pada CV SP Aluminium Yogyakarta Pengaruh Sistem Pengendalian Intern Pemerintah (SPIP) dan Pemanfaatan Teknologi Informasi terhadap Kualitas Laporan Keuangan Pemerintah Daerah Analisis Faktor-Faktor yang Memengaruhi Adopsi E-commerce oleh UMKM Kerajinan DIY Implementasi Metode Collaborative Filtering pada Aplikasi Rekomendasi Hotel dan Wisma di Kota Palangka Raya Berbasis Website Analisis Pemasaran Media Sosial pada Merk Uniqlo
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1