{"title":"对可饮用水质量分类的数据补偿分析","authors":"Generosa Lukhayu Pritalia","doi":"10.24002/konstelasi.v2i1.5630","DOIUrl":null,"url":null,"abstract":" \nAbstract. Water is essential for survival. Currently, there are requirements to monitor, assess, and classify water quality to understand the impact of industrialization. The water quality classification process has been carried out using traditional methods such as WQI and Storet, and machine learning methods. Imbalanced data in machine learning method can make this method have a tendency to predict the majority class and become biased. In addition, using all features in the classification process can degrade classification performance and lead to high computation time. To overcome the above-mentioned problems, this study proposes several approaches, included resampling the data to be balanced, determined the most suitable and contributing features, and compared the performance of machine learning algorithms in classifying potable water. The results of handling unbalanced data and implementing feature selection were able to provide increased work on the algorithm, especially the accuracy metric reached 24.8% from previous study. The most optimal algorithm performance was obtained from Random Forest with 87% of precision, 84% of recall, 16% of Miss rate, 85% of F-measure, and 85% of test accuracy, while used seven best features. However, another important aspect is the smallest Miss rate, which was 15%, obtained from Decision Tree algorithm. \n ","PeriodicalId":163388,"journal":{"name":"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum\",\"authors\":\"Generosa Lukhayu Pritalia\",\"doi\":\"10.24002/konstelasi.v2i1.5630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\" \\nAbstract. Water is essential for survival. Currently, there are requirements to monitor, assess, and classify water quality to understand the impact of industrialization. The water quality classification process has been carried out using traditional methods such as WQI and Storet, and machine learning methods. Imbalanced data in machine learning method can make this method have a tendency to predict the majority class and become biased. In addition, using all features in the classification process can degrade classification performance and lead to high computation time. To overcome the above-mentioned problems, this study proposes several approaches, included resampling the data to be balanced, determined the most suitable and contributing features, and compared the performance of machine learning algorithms in classifying potable water. The results of handling unbalanced data and implementing feature selection were able to provide increased work on the algorithm, especially the accuracy metric reached 24.8% from previous study. The most optimal algorithm performance was obtained from Random Forest with 87% of precision, 84% of recall, 16% of Miss rate, 85% of F-measure, and 85% of test accuracy, while used seven best features. However, another important aspect is the smallest Miss rate, which was 15%, obtained from Decision Tree algorithm. \\n \",\"PeriodicalId\":163388,\"journal\":{\"name\":\"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24002/konstelasi.v2i1.5630\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"KONSTELASI: Konvergensi Teknologi dan Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24002/konstelasi.v2i1.5630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum
Abstract. Water is essential for survival. Currently, there are requirements to monitor, assess, and classify water quality to understand the impact of industrialization. The water quality classification process has been carried out using traditional methods such as WQI and Storet, and machine learning methods. Imbalanced data in machine learning method can make this method have a tendency to predict the majority class and become biased. In addition, using all features in the classification process can degrade classification performance and lead to high computation time. To overcome the above-mentioned problems, this study proposes several approaches, included resampling the data to be balanced, determined the most suitable and contributing features, and compared the performance of machine learning algorithms in classifying potable water. The results of handling unbalanced data and implementing feature selection were able to provide increased work on the algorithm, especially the accuracy metric reached 24.8% from previous study. The most optimal algorithm performance was obtained from Random Forest with 87% of precision, 84% of recall, 16% of Miss rate, 85% of F-measure, and 85% of test accuracy, while used seven best features. However, another important aspect is the smallest Miss rate, which was 15%, obtained from Decision Tree algorithm.