Fei Ding , Shilong Hao , Wenjie Zhang , Mingcen Jiang , Liangyao Chen , Haobin Yuan , Nan Wang , Wenpan Li , Xin Xie
{"title":"利用多种机器学习算法优化水质指标模型及其适用性","authors":"Fei Ding , Shilong Hao , Wenjie Zhang , Mingcen Jiang , Liangyao Chen , Haobin Yuan , Nan Wang , Wenpan Li , Xin Xie","doi":"10.1016/j.ecolind.2025.113299","DOIUrl":null,"url":null,"abstract":"<div><div>Water quality assessment model and spatiotemporal heterogeneity pose challenges to the uncertainty of water quality assessment. To improve the accuracy of the water quality index (WQI) model, multiple machine learning algorithms (CatBoost, SVM, LR, XGBoost, LightGBM) and entropy weight method (EWM) were introduced to determine the objective weight. Six combined weights were determined by game theory combining objective and subjective weights (AHP). Three aggregation functions were established, including a new function proposed based on the sigmoid function and two existing functions. Based on the six combined weights and three aggregation functions, eighteen WQI models were developed. To reduce the influence of spatiotemporal heterogeneity, the assessment models for the different water quality characteristics were proposed respectively. To validate the performance of improved model, the monthly water quality monitoring data of 16 sampling sites in Chaohu Lake during 2016–2020 was used. Among them, totally 10 water quality indicators were selected, including TN, TP, etc. The results showed high accuracy and reliability of the improved WQI assessment models. The model improved by CatBoost and EWM had low uncertainty (0.559–0.903) than SVM and LR (0.576–1.034). The sensitivity of the models improved by six combined weights is ranked as W<sub>AE</sub> > W<sub>AC</sub> > W<sub>AS</sub> > W<sub>AX</sub> > W<sub>ALGB</sub> > W<sub>AL</sub>. The uncertainty of the models improved by the three aggregation functions were ranked as SGM > SWM > WQM and the sensitivity were ranked as WQM > SWM > SGM. Compared with WQM and SWM, SGM could reflect the water quality spatiotemporal heterogeneity more accurately. The WQM<sub>AE</sub>, SGM<sub>AC</sub> and SWM<sub>AC</sub> models were recommended for assessing water bodies with good quality, poor quality and heterogeneity respectively. Chaohu Lake was mainly Class II and Class III water. East had better water quality than the west. Water quality in summer and autumn was better than in spring and winter. This study can provide theoretical support for related water quality assessment work.</div></div>","PeriodicalId":11459,"journal":{"name":"Ecological Indicators","volume":"172 ","pages":"Article 113299"},"PeriodicalIF":7.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using multiple machine learning algorithms to optimize the water quality index model and their applicability\",\"authors\":\"Fei Ding , Shilong Hao , Wenjie Zhang , Mingcen Jiang , Liangyao Chen , Haobin Yuan , Nan Wang , Wenpan Li , Xin Xie\",\"doi\":\"10.1016/j.ecolind.2025.113299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Water quality assessment model and spatiotemporal heterogeneity pose challenges to the uncertainty of water quality assessment. To improve the accuracy of the water quality index (WQI) model, multiple machine learning algorithms (CatBoost, SVM, LR, XGBoost, LightGBM) and entropy weight method (EWM) were introduced to determine the objective weight. Six combined weights were determined by game theory combining objective and subjective weights (AHP). Three aggregation functions were established, including a new function proposed based on the sigmoid function and two existing functions. Based on the six combined weights and three aggregation functions, eighteen WQI models were developed. To reduce the influence of spatiotemporal heterogeneity, the assessment models for the different water quality characteristics were proposed respectively. To validate the performance of improved model, the monthly water quality monitoring data of 16 sampling sites in Chaohu Lake during 2016–2020 was used. Among them, totally 10 water quality indicators were selected, including TN, TP, etc. The results showed high accuracy and reliability of the improved WQI assessment models. The model improved by CatBoost and EWM had low uncertainty (0.559–0.903) than SVM and LR (0.576–1.034). The sensitivity of the models improved by six combined weights is ranked as W<sub>AE</sub> > W<sub>AC</sub> > W<sub>AS</sub> > W<sub>AX</sub> > W<sub>ALGB</sub> > W<sub>AL</sub>. The uncertainty of the models improved by the three aggregation functions were ranked as SGM > SWM > WQM and the sensitivity were ranked as WQM > SWM > SGM. Compared with WQM and SWM, SGM could reflect the water quality spatiotemporal heterogeneity more accurately. The WQM<sub>AE</sub>, SGM<sub>AC</sub> and SWM<sub>AC</sub> models were recommended for assessing water bodies with good quality, poor quality and heterogeneity respectively. Chaohu Lake was mainly Class II and Class III water. East had better water quality than the west. Water quality in summer and autumn was better than in spring and winter. This study can provide theoretical support for related water quality assessment work.</div></div>\",\"PeriodicalId\":11459,\"journal\":{\"name\":\"Ecological Indicators\",\"volume\":\"172 \",\"pages\":\"Article 113299\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Indicators\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1470160X25002304\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Indicators","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1470160X25002304","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Using multiple machine learning algorithms to optimize the water quality index model and their applicability
Water quality assessment model and spatiotemporal heterogeneity pose challenges to the uncertainty of water quality assessment. To improve the accuracy of the water quality index (WQI) model, multiple machine learning algorithms (CatBoost, SVM, LR, XGBoost, LightGBM) and entropy weight method (EWM) were introduced to determine the objective weight. Six combined weights were determined by game theory combining objective and subjective weights (AHP). Three aggregation functions were established, including a new function proposed based on the sigmoid function and two existing functions. Based on the six combined weights and three aggregation functions, eighteen WQI models were developed. To reduce the influence of spatiotemporal heterogeneity, the assessment models for the different water quality characteristics were proposed respectively. To validate the performance of improved model, the monthly water quality monitoring data of 16 sampling sites in Chaohu Lake during 2016–2020 was used. Among them, totally 10 water quality indicators were selected, including TN, TP, etc. The results showed high accuracy and reliability of the improved WQI assessment models. The model improved by CatBoost and EWM had low uncertainty (0.559–0.903) than SVM and LR (0.576–1.034). The sensitivity of the models improved by six combined weights is ranked as WAE > WAC > WAS > WAX > WALGB > WAL. The uncertainty of the models improved by the three aggregation functions were ranked as SGM > SWM > WQM and the sensitivity were ranked as WQM > SWM > SGM. Compared with WQM and SWM, SGM could reflect the water quality spatiotemporal heterogeneity more accurately. The WQMAE, SGMAC and SWMAC models were recommended for assessing water bodies with good quality, poor quality and heterogeneity respectively. Chaohu Lake was mainly Class II and Class III water. East had better water quality than the west. Water quality in summer and autumn was better than in spring and winter. This study can provide theoretical support for related water quality assessment work.
期刊介绍:
The ultimate aim of Ecological Indicators is to integrate the monitoring and assessment of ecological and environmental indicators with management practices. The journal provides a forum for the discussion of the applied scientific development and review of traditional indicator approaches as well as for theoretical, modelling and quantitative applications such as index development. Research into the following areas will be published.
• All aspects of ecological and environmental indicators and indices.
• New indicators, and new approaches and methods for indicator development, testing and use.
• Development and modelling of indices, e.g. application of indicator suites across multiple scales and resources.
• Analysis and research of resource, system- and scale-specific indicators.
• Methods for integration of social and other valuation metrics for the production of scientifically rigorous and politically-relevant assessments using indicator-based monitoring and assessment programs.
• How research indicators can be transformed into direct application for management purposes.
• Broader assessment objectives and methods, e.g. biodiversity, biological integrity, and sustainability, through the use of indicators.
• Resource-specific indicators such as landscape, agroecosystems, forests, wetlands, etc.