机器学习方法在有限数据环境下预测河口溶解氧中的应用

IF 2.4 4区环境科学与生态学 Q2 WATER RESOURCES Water Quality Research Journal Pub Date : 2022-08-11 DOI:10.2166/wqrj.2022.002

M. A. Z. Siddik

{"title":"机器学习方法在有限数据环境下预测河口溶解氧中的应用","authors":"M. A. Z. Siddik","doi":"10.2166/wqrj.2022.002","DOIUrl":null,"url":null,"abstract":"\n The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.","PeriodicalId":23720,"journal":{"name":"Water Quality Research Journal","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2022-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of machine learning approaches in predicting estuarine dissolved oxygen (DO) under a limited data environment\",\"authors\":\"M. A. Z. Siddik\",\"doi\":\"10.2166/wqrj.2022.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.\",\"PeriodicalId\":23720,\"journal\":{\"name\":\"Water Quality Research Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2022-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Water Quality Research Journal\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.2166/wqrj.2022.002\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"WATER RESOURCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Quality Research Journal","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.2166/wqrj.2022.002","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"WATER RESOURCES","Score":null,"Total":0}

引用次数: 2

摘要

由于营养数据不可用，机器学习（ML）方法在从一组包括营养素在内的环境协变量预测河口溶解氧（DO）方面的应用尚未探索。利用佛罗里达州西南沿海12个水质站的数据，检验了四个ML模型——支持向量机（SVM）、随机森林（RF）、决策树和王-孟德尔——在有限营养数据环境下预测DO的适用性。月水温、pH、盐度、总氮（TN）和总磷（TP）数据用于模型开发。将多元线性回归模型作为基准进行训练，以比较ML模型的性能。当所有预测变量都用于模型开发时，位点特异性RF和SVM显示出优越的模型效率（Nash–Sutcliffe效率>0.80）。然而，在没有营养素的情况下训练的模型显示预测准确性降低。通过在TN限制、TP限制和TN-&TP共限制条件下合成所有站点数据进行建模，表明RF具有较好的性能。总的来说，该研究得出了两个关键结论，可以补充现有的环境管理总日负荷估计方法：（1）营养物质是河口DO动力学的必要预测因子；（2）在有限的数据环境下，RF在ML方法中表现更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Application of machine learning approaches in predicting estuarine dissolved oxygen (DO) under a limited data environment

The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Water Quality Research Journal WATER RESOURCES-

CiteScore

4.50

自引率

8.70%

发文量