{"title":"机器学习方法在有限数据环境下预测河口溶解氧中的应用","authors":"M. A. Z. Siddik","doi":"10.2166/wqrj.2022.002","DOIUrl":null,"url":null,"abstract":"\n The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.","PeriodicalId":23720,"journal":{"name":"Water Quality Research Journal","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2022-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Application of machine learning approaches in predicting estuarine dissolved oxygen (DO) under a limited data environment\",\"authors\":\"M. A. Z. Siddik\",\"doi\":\"10.2166/wqrj.2022.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.\",\"PeriodicalId\":23720,\"journal\":{\"name\":\"Water Quality Research Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2022-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Water Quality Research Journal\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.2166/wqrj.2022.002\",\"RegionNum\":4,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"WATER RESOURCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Water Quality Research Journal","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.2166/wqrj.2022.002","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"WATER RESOURCES","Score":null,"Total":0}
Application of machine learning approaches in predicting estuarine dissolved oxygen (DO) under a limited data environment
The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency > 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.