Application of machine learning and deep neural networks for spatial prediction of groundwater nitrate concentration to improve land use management practices
D. Karimanzira, J. Weis, Andreas Wunsch, Linda Ritzau, T. Liesch, M. Ohmer
{"title":"Application of machine learning and deep neural networks for spatial prediction of groundwater nitrate concentration to improve land use management practices","authors":"D. Karimanzira, J. Weis, Andreas Wunsch, Linda Ritzau, T. Liesch, M. Ohmer","doi":"10.3389/frwa.2023.1193142","DOIUrl":null,"url":null,"abstract":"The prediction of groundwater nitrate concentration's response to geo-environmental and human-influenced factors is essential to better restore groundwater quality and improve land use management practices. In this paper, we regionalize groundwater nitrate concentration using different machine learning methods (Random forest (RF), unimodal 2D and 3D convolutional neural networks (CNN), and multi-stream early and late fusion 2D-CNNs) so that the nitrate situation in unobserved areas can be predicted. CNNs take into account not only the nitrate values of the grid cells of the observation wells but also the values around them. This has the added benefit of allowing them to learn directly about the influence of the surroundings. The predictive performance of the models was tested on a dataset from a pilot region in Germany, and the results show that, in general, all the machine learning models, after a Bayesian optimization hyperparameter search and training, achieve good spatial predictive performance compared to previous studies based on Kriging and numerical models. Based on the mean absolute error (MAE), the random forest model and the 2DCNN late fusion model performed best with an MAE (STD) of 9.55 (0.367) mg/l, R2 = 0.43 and 10.32 (0.27) mg/l, R2 = 0.27, respectively. The 3DCNN with an MAE (STD) of 11.66 (0.21) mg/l and largest resources consumption is the worst performing model. Feature importance learning from the models was used in conjunction with partial dependency analysis of the most important features to gain greater insight into the major factors explaining the nitrate spatial variability. Large uncertainties in nitrate prediction have been shown in previous studies. Therefore, the models were extended to quantify uncertainty using prediction intervals (PIs) derived from bootstrapping. Knowledge of uncertainty helps the water manager reduce risk and plan more reliably.","PeriodicalId":33801,"journal":{"name":"Frontiers in Water","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Water","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frwa.2023.1193142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"WATER RESOURCES","Score":null,"Total":0}
引用次数: 0
Abstract
The prediction of groundwater nitrate concentration's response to geo-environmental and human-influenced factors is essential to better restore groundwater quality and improve land use management practices. In this paper, we regionalize groundwater nitrate concentration using different machine learning methods (Random forest (RF), unimodal 2D and 3D convolutional neural networks (CNN), and multi-stream early and late fusion 2D-CNNs) so that the nitrate situation in unobserved areas can be predicted. CNNs take into account not only the nitrate values of the grid cells of the observation wells but also the values around them. This has the added benefit of allowing them to learn directly about the influence of the surroundings. The predictive performance of the models was tested on a dataset from a pilot region in Germany, and the results show that, in general, all the machine learning models, after a Bayesian optimization hyperparameter search and training, achieve good spatial predictive performance compared to previous studies based on Kriging and numerical models. Based on the mean absolute error (MAE), the random forest model and the 2DCNN late fusion model performed best with an MAE (STD) of 9.55 (0.367) mg/l, R2 = 0.43 and 10.32 (0.27) mg/l, R2 = 0.27, respectively. The 3DCNN with an MAE (STD) of 11.66 (0.21) mg/l and largest resources consumption is the worst performing model. Feature importance learning from the models was used in conjunction with partial dependency analysis of the most important features to gain greater insight into the major factors explaining the nitrate spatial variability. Large uncertainties in nitrate prediction have been shown in previous studies. Therefore, the models were extended to quantify uncertainty using prediction intervals (PIs) derived from bootstrapping. Knowledge of uncertainty helps the water manager reduce risk and plan more reliably.