Ahmed M Elshewey, Rasha Y Youssef, Hazem M El-Bakry, Ahmed M Osman
{"title":"Water potability classification based on hybrid stacked model and feature selection.","authors":"Ahmed M Elshewey, Rasha Y Youssef, Hazem M El-Bakry, Ahmed M Osman","doi":"10.1007/s11356-025-36120-0","DOIUrl":null,"url":null,"abstract":"<p><p>Clean water requires accurate water quality categorization. A water potability (WP) dataset with pH, hardness, solids, chloramines, sulfate, conductivity, and other metrics for 3276 water bodies was used in this paper. After median imputation for missing values, normalization for feature scaling, and class imbalance correction using SMOTE, the Kaggle public dataset was prepared. With binary particle swarm optimization (BPSO) and binary whale optimization algorithm (BWAO), feature selection (FS) was used to determine the most important features for classification. A subset of seven essential characteristics is selected with the lowest average error of 0.3745 by the BPSO. Random forest (RF), gradient boosting (GB), support vector machine (SVM), Extra Tree (ET), decision tree (DT), and XGBoost are tested for WP prediction. The ET classifier ranked first, with 70.63% accuracy and 71.17% F1-score. Predictive performance was improved by stacking random forest, extra trees, and XGBoost base learners with Logistic Regression meta-learner. The stacking model improved with 69.53% accuracy, 70.23% F1-score, and 77.62% AUC. We found that stacking uses high-performing models to create a strong and balanced categorization framework. This paper shows that ensemble learning can improve WP categorization and that stacking may be a feasible way for measuring and managing water quality.</p>","PeriodicalId":545,"journal":{"name":"Environmental Science and Pollution Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s11356-025-36120-0","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Clean water requires accurate water quality categorization. A water potability (WP) dataset with pH, hardness, solids, chloramines, sulfate, conductivity, and other metrics for 3276 water bodies was used in this paper. After median imputation for missing values, normalization for feature scaling, and class imbalance correction using SMOTE, the Kaggle public dataset was prepared. With binary particle swarm optimization (BPSO) and binary whale optimization algorithm (BWAO), feature selection (FS) was used to determine the most important features for classification. A subset of seven essential characteristics is selected with the lowest average error of 0.3745 by the BPSO. Random forest (RF), gradient boosting (GB), support vector machine (SVM), Extra Tree (ET), decision tree (DT), and XGBoost are tested for WP prediction. The ET classifier ranked first, with 70.63% accuracy and 71.17% F1-score. Predictive performance was improved by stacking random forest, extra trees, and XGBoost base learners with Logistic Regression meta-learner. The stacking model improved with 69.53% accuracy, 70.23% F1-score, and 77.62% AUC. We found that stacking uses high-performing models to create a strong and balanced categorization framework. This paper shows that ensemble learning can improve WP categorization and that stacking may be a feasible way for measuring and managing water quality.
期刊介绍:
Environmental Science and Pollution Research (ESPR) serves the international community in all areas of Environmental Science and related subjects with emphasis on chemical compounds. This includes:
- Terrestrial Biology and Ecology
- Aquatic Biology and Ecology
- Atmospheric Chemistry
- Environmental Microbiology/Biobased Energy Sources
- Phytoremediation and Ecosystem Restoration
- Environmental Analyses and Monitoring
- Assessment of Risks and Interactions of Pollutants in the Environment
- Conservation Biology and Sustainable Agriculture
- Impact of Chemicals/Pollutants on Human and Animal Health
It reports from a broad interdisciplinary outlook.