Yongchun Liang , Fangyu Ding , Lei Liu , Fang Yin , Mengmeng Hao , Tingting Kang , Chuanpeng Zhao , Ziteng Wang , Dong Jiang
{"title":"利用多源数据和机器学习方法监测城市河流水质参数","authors":"Yongchun Liang , Fangyu Ding , Lei Liu , Fang Yin , Mengmeng Hao , Tingting Kang , Chuanpeng Zhao , Ziteng Wang , Dong Jiang","doi":"10.1016/j.jhydrol.2024.132394","DOIUrl":null,"url":null,"abstract":"<div><div>The systematic surveillance of nutrients and organic pollution in urban rivers is crucial for enhancing ecological integrity and promoting societal and economic sustainability. Currently, the primary methods of water quality monitoring involve on-site sampling and laboratory analysis, which are constrained by various factors such as terrain and climate. Remote sensing water quality monitoring, which enables large-scale, periodic, and comprehensive coverage, serves as an important supplement to these traditional methods. However, most current research on water quality monitoring predominantly relies on remote sensing technology, often overlooking the application of other multi-source data. In this study, we examined rivers in the Weihe River Basin by integrating field samples, Sentinel-2 multispectral imagery, meteorological elements, and land use types to construct machine learning (ML) models for predicting four water quality parameters (WQPs): ammonia nitrogen (NH<sub>3</sub>-N), total phosphorus (TP), chemical oxygen demand (COD), and dissolved oxygen (DO). The results showed that land use types significantly influenced the accuracy of predictions for NH<sub>3</sub>-N, TP, COD, and DO. Among the models evaluated, the Extra Tree Regression (ETR), eXtreme Gradient Boosting (XGBoost), and Gradient Boosting Regression (GBR) demonstrated the highest accuracy and transferability for monitoring WQPs in rivers. For instance, the models achieved the following coefficients of determination (R<sup>2</sup>) in 5-fold cross-validation: for NH<sub>3</sub>-N, R<sup>2</sup> was 0.65 in both the testing and validation datasets; for TP, R<sup>2</sup> was 0.71 and 0.68; for COD, R<sup>2</sup> was 0.50 and 0.47; and for DO, R<sup>2</sup> was 0.68 and 0.64, respectively. Therefore, our findings underscore the feasibility of using multi-source data and ML methods to quantify water pollutants in urban rivers, providing essential technical support for monitoring the spatiotemporal dynamics of river water quality across extensive geographical areas.</div></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":"648 ","pages":"Article 132394"},"PeriodicalIF":5.9000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach\",\"authors\":\"Yongchun Liang , Fangyu Ding , Lei Liu , Fang Yin , Mengmeng Hao , Tingting Kang , Chuanpeng Zhao , Ziteng Wang , Dong Jiang\",\"doi\":\"10.1016/j.jhydrol.2024.132394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The systematic surveillance of nutrients and organic pollution in urban rivers is crucial for enhancing ecological integrity and promoting societal and economic sustainability. Currently, the primary methods of water quality monitoring involve on-site sampling and laboratory analysis, which are constrained by various factors such as terrain and climate. Remote sensing water quality monitoring, which enables large-scale, periodic, and comprehensive coverage, serves as an important supplement to these traditional methods. However, most current research on water quality monitoring predominantly relies on remote sensing technology, often overlooking the application of other multi-source data. In this study, we examined rivers in the Weihe River Basin by integrating field samples, Sentinel-2 multispectral imagery, meteorological elements, and land use types to construct machine learning (ML) models for predicting four water quality parameters (WQPs): ammonia nitrogen (NH<sub>3</sub>-N), total phosphorus (TP), chemical oxygen demand (COD), and dissolved oxygen (DO). The results showed that land use types significantly influenced the accuracy of predictions for NH<sub>3</sub>-N, TP, COD, and DO. Among the models evaluated, the Extra Tree Regression (ETR), eXtreme Gradient Boosting (XGBoost), and Gradient Boosting Regression (GBR) demonstrated the highest accuracy and transferability for monitoring WQPs in rivers. For instance, the models achieved the following coefficients of determination (R<sup>2</sup>) in 5-fold cross-validation: for NH<sub>3</sub>-N, R<sup>2</sup> was 0.65 in both the testing and validation datasets; for TP, R<sup>2</sup> was 0.71 and 0.68; for COD, R<sup>2</sup> was 0.50 and 0.47; and for DO, R<sup>2</sup> was 0.68 and 0.64, respectively. Therefore, our findings underscore the feasibility of using multi-source data and ML methods to quantify water pollutants in urban rivers, providing essential technical support for monitoring the spatiotemporal dynamics of river water quality across extensive geographical areas.</div></div>\",\"PeriodicalId\":362,\"journal\":{\"name\":\"Journal of Hydrology\",\"volume\":\"648 \",\"pages\":\"Article 132394\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2024-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hydrology\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022169424017906\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424017906","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
摘要
系统监测城市河流中的营养物质和有机污染对于增强生态完整性和促进社会和经济的可持续性至关重要。目前,水质监测的主要方法是现场采样和实验室分析,受地形、气候等多种因素的制约。遥感水质监测是对这些传统方法的重要补充,具有大尺度、周期性和全面覆盖的特点。然而,目前大多数水质监测研究主要依赖于遥感技术,往往忽视了其他多源数据的应用。本研究以渭河流域河流为研究对象,结合现场采样、Sentinel-2多光谱影像、气象要素和土地利用类型,构建了机器学习(ML)模型,用于预测4个水质参数(氨氮(NH3-N)、总磷(TP)、化学需氧量(COD)和溶解氧(DO))。结果表明,土地利用类型显著影响NH3-N、TP、COD和DO的预测精度。其中,Extra Tree Regression (ETR)、eXtreme Gradient Boosting (XGBoost)和Gradient Boosting Regression (GBR)对河流WQPs的监测精度和可移植性最高。例如,模型在5倍交叉验证中获得了以下决定系数(R2):对于NH3-N,测试和验证数据集的R2均为0.65;TP的R2分别为0.71和0.68;COD的R2分别为0.50和0.47;DO的R2分别为0.68和0.64。因此,我们的研究结果强调了使用多源数据和ML方法量化城市河流水污染物的可行性,为监测广泛地理区域的河流水质时空动态提供了必要的技术支持。
Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach
The systematic surveillance of nutrients and organic pollution in urban rivers is crucial for enhancing ecological integrity and promoting societal and economic sustainability. Currently, the primary methods of water quality monitoring involve on-site sampling and laboratory analysis, which are constrained by various factors such as terrain and climate. Remote sensing water quality monitoring, which enables large-scale, periodic, and comprehensive coverage, serves as an important supplement to these traditional methods. However, most current research on water quality monitoring predominantly relies on remote sensing technology, often overlooking the application of other multi-source data. In this study, we examined rivers in the Weihe River Basin by integrating field samples, Sentinel-2 multispectral imagery, meteorological elements, and land use types to construct machine learning (ML) models for predicting four water quality parameters (WQPs): ammonia nitrogen (NH3-N), total phosphorus (TP), chemical oxygen demand (COD), and dissolved oxygen (DO). The results showed that land use types significantly influenced the accuracy of predictions for NH3-N, TP, COD, and DO. Among the models evaluated, the Extra Tree Regression (ETR), eXtreme Gradient Boosting (XGBoost), and Gradient Boosting Regression (GBR) demonstrated the highest accuracy and transferability for monitoring WQPs in rivers. For instance, the models achieved the following coefficients of determination (R2) in 5-fold cross-validation: for NH3-N, R2 was 0.65 in both the testing and validation datasets; for TP, R2 was 0.71 and 0.68; for COD, R2 was 0.50 and 0.47; and for DO, R2 was 0.68 and 0.64, respectively. Therefore, our findings underscore the feasibility of using multi-source data and ML methods to quantify water pollutants in urban rivers, providing essential technical support for monitoring the spatiotemporal dynamics of river water quality across extensive geographical areas.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.