Ajay Kumar Taloor , Shiwalika Sambyal , Ravi Sharma , Surya Dev , Sourabh Shastri , Rakesh Kumar
{"title":"Advanced hydrogeochemical facies classification: A comparative analysis of Machine Learning models with SMOTE in the Tawi basin","authors":"Ajay Kumar Taloor , Shiwalika Sambyal , Ravi Sharma , Surya Dev , Sourabh Shastri , Rakesh Kumar","doi":"10.1016/j.pce.2024.103785","DOIUrl":null,"url":null,"abstract":"<div><div>Water is an important natural resource and clean water is vital for maintaining health and hygiene of all living organisms. Estimating and classifying water quality facies is a critical way to analyse water quality and proper water management. The present study underlines the applicability of Machine Learning (ML) models to assess water quality by classifying hydrogeochemical facies within the Tawi basin of the Jammu region. This study employs a range of ML algorithms, including Decision Tree (DT), XGBoost, Random Forest (RF), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN), to evaluate their effectiveness in accurately classifying hydrogeochemical facies derived from Piper's diagram. The dataset, consisting of chemical parameters extracted from water samples collected from the Tawi basin, was initially imbalanced, with a large majority of samples belonging to a single facies. To address this, we applied the Synthetic Minority Over-sampling Technique (SMOTE), ensuring balanced class distributions for more reliable model training and evaluation. The classification results demonstrate high accuracy across the models, with DT achieving 93%, RF 99%, XGBoost 96%, KNN 81%, and ANN 96%. In addition to overall accuracy, we employed other evaluation metrics such as precision, recall, F1-score, and the precision-recall curve to provide a more comprehensive assessment of model performance. The results underscore the potential of ML in automating water quality assessment based on hydrogeochemical parameters. The findings of the study provide a robust framework for using ML models in determining water quality, particularly in regions where data is scarce and conventional analysis is limited.</div></div>","PeriodicalId":54616,"journal":{"name":"Physics and Chemistry of the Earth","volume":"137 ","pages":"Article 103785"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics and Chemistry of the Earth","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474706524002432","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Water is an important natural resource and clean water is vital for maintaining health and hygiene of all living organisms. Estimating and classifying water quality facies is a critical way to analyse water quality and proper water management. The present study underlines the applicability of Machine Learning (ML) models to assess water quality by classifying hydrogeochemical facies within the Tawi basin of the Jammu region. This study employs a range of ML algorithms, including Decision Tree (DT), XGBoost, Random Forest (RF), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN), to evaluate their effectiveness in accurately classifying hydrogeochemical facies derived from Piper's diagram. The dataset, consisting of chemical parameters extracted from water samples collected from the Tawi basin, was initially imbalanced, with a large majority of samples belonging to a single facies. To address this, we applied the Synthetic Minority Over-sampling Technique (SMOTE), ensuring balanced class distributions for more reliable model training and evaluation. The classification results demonstrate high accuracy across the models, with DT achieving 93%, RF 99%, XGBoost 96%, KNN 81%, and ANN 96%. In addition to overall accuracy, we employed other evaluation metrics such as precision, recall, F1-score, and the precision-recall curve to provide a more comprehensive assessment of model performance. The results underscore the potential of ML in automating water quality assessment based on hydrogeochemical parameters. The findings of the study provide a robust framework for using ML models in determining water quality, particularly in regions where data is scarce and conventional analysis is limited.
期刊介绍:
Physics and Chemistry of the Earth is an international interdisciplinary journal for the rapid publication of collections of refereed communications in separate thematic issues, either stemming from scientific meetings, or, especially compiled for the occasion. There is no restriction on the length of articles published in the journal. Physics and Chemistry of the Earth incorporates the separate Parts A, B and C which existed until the end of 2001.
Please note: the Editors are unable to consider submissions that are not invited or linked to a thematic issue. Please do not submit unsolicited papers.
The journal covers the following subject areas:
-Solid Earth and Geodesy:
(geology, geochemistry, tectonophysics, seismology, volcanology, palaeomagnetism and rock magnetism, electromagnetism and potential fields, marine and environmental geosciences as well as geodesy).
-Hydrology, Oceans and Atmosphere:
(hydrology and water resources research, engineering and management, oceanography and oceanic chemistry, shelf, sea, lake and river sciences, meteorology and atmospheric sciences incl. chemistry as well as climatology and glaciology).
-Solar-Terrestrial and Planetary Science:
(solar, heliospheric and solar-planetary sciences, geology, geophysics and atmospheric sciences of planets, satellites and small bodies as well as cosmochemistry and exobiology).