Nitya Nand Jha, R. Singh, Sushila Sharma, Abhishek Kumar
{"title":"Computational Machine Learning Analytics for Prediction of Water Quality","authors":"Nitya Nand Jha, R. Singh, Sushila Sharma, Abhishek Kumar","doi":"10.52783/cana.v31.942","DOIUrl":null,"url":null,"abstract":"In terms of impacts on ecosystems, industry, people, and flora and fauna, water quality is paramount. Contamination and pollution have degraded water quality in recent decades. Predicting WQC and Water Quality Index (WQI) is the problem of this article; WQI is an important measure of water validity. This research use machine learning approaches to forecast WQI and WQC, and it does so by optimizing and tweaking the parameters of several machine learning models. Parameter optimization and tuning for four classification models and four regression models both make use of grid search, an essential tool in both contexts. To forecast WQC, classification models such as Random Forest (RF), Extreme Gradient Boosting (Xgboost), Gradient Boosting (GB), and Adaptive Boosting (Ada-Boost) are used. Predicting WQI is done using regression models such as K-nearest neighbour (KNN), decision tree (DT), support vector regression (SVR), and multi-layer perceptron (MLP). Data normalization and data imputation (mean imputation) were also executed as pretreatment steps to suit the data and make it convenient for any further processing. Seven characteristics and ninety-one cases make up the dataset used for this research. Five evaluation measures were calculated to evaluate the classification systems' effectiveness: accuracy, recall, precision, Matthews' Correlation Coefficient (MCC), and F1 score. A total of four evaluation metrics were calculated to measure the efficacy of the regression models: MAE, MedAE,MSE, and R2. The results of the testing showed that the GB model yielded the most accurate predictions of WQC values (99.50%), making it the top performer in terms of categorization. The experimental findings show that the MLP regressor model got a value of 99.8 percent R2 when predicting WQI values, making it the best performing model in regression.","PeriodicalId":40036,"journal":{"name":"Communications on Applied Nonlinear Analysis","volume":" 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Applied Nonlinear Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52783/cana.v31.942","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
In terms of impacts on ecosystems, industry, people, and flora and fauna, water quality is paramount. Contamination and pollution have degraded water quality in recent decades. Predicting WQC and Water Quality Index (WQI) is the problem of this article; WQI is an important measure of water validity. This research use machine learning approaches to forecast WQI and WQC, and it does so by optimizing and tweaking the parameters of several machine learning models. Parameter optimization and tuning for four classification models and four regression models both make use of grid search, an essential tool in both contexts. To forecast WQC, classification models such as Random Forest (RF), Extreme Gradient Boosting (Xgboost), Gradient Boosting (GB), and Adaptive Boosting (Ada-Boost) are used. Predicting WQI is done using regression models such as K-nearest neighbour (KNN), decision tree (DT), support vector regression (SVR), and multi-layer perceptron (MLP). Data normalization and data imputation (mean imputation) were also executed as pretreatment steps to suit the data and make it convenient for any further processing. Seven characteristics and ninety-one cases make up the dataset used for this research. Five evaluation measures were calculated to evaluate the classification systems' effectiveness: accuracy, recall, precision, Matthews' Correlation Coefficient (MCC), and F1 score. A total of four evaluation metrics were calculated to measure the efficacy of the regression models: MAE, MedAE,MSE, and R2. The results of the testing showed that the GB model yielded the most accurate predictions of WQC values (99.50%), making it the top performer in terms of categorization. The experimental findings show that the MLP regressor model got a value of 99.8 percent R2 when predicting WQI values, making it the best performing model in regression.