Pub Date : 2024-08-12DOI: 10.1007/s00477-024-02778-0
Chouaib El Hachimi, Salwa Belaqziz, Saïd Khabba, Bouchra Ait Hssaine, Mohamed Hakim Kharrou, Abdelghani Chehbouni
As precision agriculture (PA) advances, the demand for accurate and high-resolution weather forecasts becomes critical for optimizing agricultural management practices. Despite improvements in Numerical Weather Prediction (NWP) models, they lack the granularity and efficiency needed for PA. Data-driven models offer a promising alternative by integrating predictive capabilities closer to IoT edge data sources, but their efficacy requires evaluation. Here, this paper evaluates six models from three data-driven eras (statistical, machine learning, and deep learning) using agrometeorological data from an Automatic Weather Station (AWS) in Sidi Rahal, East Marrakech, central Morocco, covering 2013–2020 at half-hour intervals, including air temperature, solar radiation, and relative humidity. First, the data is quality-controlled through imputation using ERA5-Land. Then, the dataset was split into training (2013–2019) and evaluation (2020) sets, with validation horizons of 1 day, 3 days, and 1 week. Statistical models generally perform well in air temperature forecasting, occasionally surpassing other models. However, the Temporal Convolutional Neural Network (TCNN) consistently demonstrates superior performance for challenging variables, balancing low RMSE and high R2 across various horizons, with some exceptions. Specifically, for relative humidity, the linear regression model achieves slightly lower RMSE (3,96% and 6,05%) compared to TCNN (4,00% and 6,79%) for 1 day and 3 days, respectively. Additionally, CatBoost outperforms TCNN for 1-week forecasts. In terms of training time, the Transformer requires the longest, followed by AutoARIMA and CatBoost. Uncertainty analysis of stochastic models using solar radiation showed the stable performance of TCNN with 0,80 and 0,01 for the RMSE and R2 standard deviations, respectively. Considering the trade-off between performance, training time, and capturing complex relationships, TCNN emerges as the optimal choice. ANOVA, Tukey’s HSD and Mann-Whitney U statistical tests also confirmed TCNN’s performance. Finally, a comparison with the Global Forecast System (GFS) reveals TCNN’s clear superiority in all metrics, particularly evident for the RMSE of 3 days air temperature forecasts (TCNN: 1,96 °C, GFS: 3,59 °C).
{"title":"Advancements in weather forecasting for precision agriculture: From statistical modeling to transformer-based architectures","authors":"Chouaib El Hachimi, Salwa Belaqziz, Saïd Khabba, Bouchra Ait Hssaine, Mohamed Hakim Kharrou, Abdelghani Chehbouni","doi":"10.1007/s00477-024-02778-0","DOIUrl":"https://doi.org/10.1007/s00477-024-02778-0","url":null,"abstract":"<p>As precision agriculture (PA) advances, the demand for accurate and high-resolution weather forecasts becomes critical for optimizing agricultural management practices. Despite improvements in Numerical Weather Prediction (NWP) models, they lack the granularity and efficiency needed for PA. Data-driven models offer a promising alternative by integrating predictive capabilities closer to IoT edge data sources, but their efficacy requires evaluation. Here, this paper evaluates six models from three data-driven eras (statistical, machine learning, and deep learning) using agrometeorological data from an Automatic Weather Station (AWS) in Sidi Rahal, East Marrakech, central Morocco, covering 2013–2020 at half-hour intervals, including air temperature, solar radiation, and relative humidity. First, the data is quality-controlled through imputation using ERA5-Land. Then, the dataset was split into training (2013–2019) and evaluation (2020) sets, with validation horizons of 1 day, 3 days, and 1 week. Statistical models generally perform well in air temperature forecasting, occasionally surpassing other models. However, the Temporal Convolutional Neural Network (TCNN) consistently demonstrates superior performance for challenging variables, balancing low RMSE and high R<sup>2</sup> across various horizons, with some exceptions. Specifically, for relative humidity, the linear regression model achieves slightly lower RMSE (3,96% and 6,05%) compared to TCNN (4,00% and 6,79%) for 1 day and 3 days, respectively. Additionally, CatBoost outperforms TCNN for 1-week forecasts. In terms of training time, the Transformer requires the longest, followed by AutoARIMA and CatBoost. Uncertainty analysis of stochastic models using solar radiation showed the stable performance of TCNN with 0,80 and 0,01 for the RMSE and R<sup>2</sup> standard deviations, respectively. Considering the trade-off between performance, training time, and capturing complex relationships, TCNN emerges as the optimal choice. ANOVA, Tukey’s HSD and Mann-Whitney U statistical tests also confirmed TCNN’s performance. Finally, a comparison with the Global Forecast System (GFS) reveals TCNN’s clear superiority in all metrics, particularly evident for the RMSE of 3 days air temperature forecasts (TCNN: 1,96 °C, GFS: 3,59 °C).</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"59 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1007/s00477-024-02787-z
Muhammad Mohsin, Salman Abbas, Muhammad Mubeen Khan
Natural hazards are the extreme events that significantly distress life on Earth. To mitigate the detrimental impacts of these extreme events, it is essential to examine and model them using a probabilistic approach. Probability distributions are competent enough to analyze the exponential behavior and estimate the pattern of randomness in these real-life phenomena. We use the generalized Pareto-exponential distribution (GPED) and find it to be an appropriate model for extreme events that involve exponentially decaying variables. Interestingly, the GPED also comprises the features of both the well-known exponential and Pareto distributions and approaches several other well-known distributions after certain transformations. We derive its various probabilistic characteristics and provide an empirical study for different parametric values to observe their behavior. We follow the maximum likelihood method to estimate the unknown model parameters and conduct a simulation study for different sample sizes and different combinations of the model parameters to examine their stability. We also demonstrate the applicability of our model by using a data set from the field of seismology and establish its better performance by comparing it with some extant distributions.
{"title":"Probabilistic modeling of extreme events involving decaying variables with an application in seismology","authors":"Muhammad Mohsin, Salman Abbas, Muhammad Mubeen Khan","doi":"10.1007/s00477-024-02787-z","DOIUrl":"https://doi.org/10.1007/s00477-024-02787-z","url":null,"abstract":"<p>Natural hazards are the extreme events that significantly distress life on Earth. To mitigate the detrimental impacts of these extreme events, it is essential to examine and model them using a probabilistic approach. Probability distributions are competent enough to analyze the exponential behavior and estimate the pattern of randomness in these real-life phenomena. We use the generalized Pareto-exponential distribution (GPED) and find it to be an appropriate model for extreme events that involve exponentially decaying variables. Interestingly, the GPED also comprises the features of both the well-known exponential and Pareto distributions and approaches several other well-known distributions after certain transformations. We derive its various probabilistic characteristics and provide an empirical study for different parametric values to observe their behavior. We follow the maximum likelihood method to estimate the unknown model parameters and conduct a simulation study for different sample sizes and different combinations of the model parameters to examine their stability. We also demonstrate the applicability of our model by using a data set from the field of seismology and establish its better performance by comparing it with some extant distributions.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"64 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Measurement of hydraulic conductivity (HC) in the field and laboratory is time-consuming, laborious, and expensive, pedo-transfer functions can be used to predict the soil HC using easy-to-measure soil properties like bulk density (BD), soil texture, fractal dimension (D), organic carbon (OC) and glomalin content. In this study, 121 soil samples were used to predict HC using Multi Linear Regression, and four machine learning-based models i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Trees (CART) and Random Forest (RF). Two sets of input data were used i.e., dataset 1: texture data, BD, OC, and glomalin content and dataset 2: D, BD, OC, and glomalin content (Dataset 2). The models were evaluated based on Mean Absolute Error, Mean Absolute Percentage Error, Nash–Sutcliffe model efficiency, Root Mean Square Error (RMSE), and correlation coefficient. ANN with three hidden layers performed significantly for both input sets. The RMSE value was decreased by 17% in the training dataset and by 5.55% in the testing dataset when D was added to the input set for ANN. For both datasets, RF performed better and outperformed CART in predicting HC. According to the results, SVM with dataset 2 outperformed all other models which showed the inclusion of D in the dataset could predict HC more efficiently. However, further study is required for different combinations of datasets for evaluating the prediction efficiency of machine learning models for various regions.
在野外和实验室测量水力传导性(HC)费时、费力且成本高昂,而利用体积密度(BD)、土壤质地、分形维度(D)、有机碳(OC)和胶褐素含量等易于测量的土壤特性,可以使用脚踏转移函数来预测土壤的水力传导性。本研究使用多元线性回归和四种基于机器学习的模型(即人工神经网络 (ANN)、支持向量机 (SVM)、分类回归树 (CART) 和随机森林 (RF))来预测 121 个土壤样本的碳氢化合物含量。使用了两组输入数据,即数据集 1:纹理数据、BD、OC 和胶霉素含量;数据集 2:D、BD、OC 和胶霉素含量(数据集 2)。根据平均绝对误差、平均绝对百分比误差、纳什-苏特克利夫模型效率、均方根误差(RMSE)和相关系数对模型进行了评估。具有三个隐藏层的 ANN 在两个输入集上都有显著表现。在训练数据集和测试数据集中,当将 D 加入到 ANN 的输入集时,RMSE 值分别降低了 17%和 5.55%。对于这两个数据集,RF 在预测 HC 方面表现更好,优于 CART。结果显示,使用数据集 2 的 SVM 的表现优于所有其他模型,这表明在数据集中加入 D 可以更有效地预测 HC。不过,还需要进一步研究不同的数据集组合,以评估机器学习模型对不同地区的预测效率。
{"title":"Inclusion of fractal dimension in machine learning models improves the prediction accuracy of hydraulic conductivity","authors":"Abhradip Sarkar, Pragati Pramanik Maity, Mrinmoy Ray, Aditi Kundu","doi":"10.1007/s00477-024-02793-1","DOIUrl":"https://doi.org/10.1007/s00477-024-02793-1","url":null,"abstract":"<p>Measurement of hydraulic conductivity (HC) in the field and laboratory is time-consuming, laborious, and expensive, pedo-transfer functions can be used to predict the soil HC using easy-to-measure soil properties like bulk density (BD), soil texture, fractal dimension (D), organic carbon (OC) and glomalin content. In this study, 121 soil samples were used to predict HC using Multi Linear Regression, and four machine learning-based models i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Trees (CART) and Random Forest (RF). Two sets of input data were used i.e., dataset 1: texture data, BD, OC, and glomalin content and dataset 2: D, BD, OC, and glomalin content (Dataset 2). The models were evaluated based on Mean Absolute Error, Mean Absolute Percentage Error, Nash–Sutcliffe model efficiency, Root Mean Square Error (RMSE), and correlation coefficient. ANN with three hidden layers performed significantly for both input sets. The RMSE value was decreased by 17% in the training dataset and by 5.55% in the testing dataset when D was added to the input set for ANN. For both datasets, RF performed better and outperformed CART in predicting HC. According to the results, SVM with dataset 2 outperformed all other models which showed the inclusion of D in the dataset could predict HC more efficiently. However, further study is required for different combinations of datasets for evaluating the prediction efficiency of machine learning models for various regions.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"12 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1007/s00477-024-02789-x
Lei-Lei Liu, Yue-Bing Xu, Wen-Qing Zhu, Khan Zallah, Lei Huang, Can Wang
Slope reliability is of great importance in geotechnical engineering, and it is susceptible to various factors, such as slope cutting and rainfall. Currently, how the copula dependence structure affects the reliability of cutting slopes under rainfall conditions is still an open question. This study investigates the influence of copula dependence structure on the reliability analysis of a real slope, considering the slope cutting and rainfall characteristics (i.e., rainfall intensity, duration, and pattern). The Gaussian, Plackett, Frank, and No.16 copulas are first employed to model the joint probability distribution of the measured soil strength parameters. The optimal copula is subsequently identified using Akaike information criterion and Bayesian information criterion. The probability of failure (Pf) and the distribution of critical slip surface (CSS) for different slope cutting and rainfall conditions are then obtained within the framework of Monte Carlo simulation. The results show that the copula dependence between shear strengths has significant influence on the Pf for the cutting slope under rainfall conditions. The commonly used Gaussian copula may underestimate the Pf, while the No.16 copula would overestimate the Pf for different slope cutting angles and rainfall intensities, durations and patterns. The differences in Pf obtained by different copula functions decrease with the increase of cutting angle, cutting distance and rainfall intensity. Furthermore, the differences in Pf obtained by different copula functions show little variations with changes in rainfall duration and pattern. Although the copula function has a significant influence on the Pf, it has negligible influence on CSS. This study provides a practical tool for the selection of copula function and valuable insights for slope design and management under slope cutting and rainfall conditions.
{"title":"Reliability analysis of cutting slopes under rainfall conditions considering copula dependence between shear strengths","authors":"Lei-Lei Liu, Yue-Bing Xu, Wen-Qing Zhu, Khan Zallah, Lei Huang, Can Wang","doi":"10.1007/s00477-024-02789-x","DOIUrl":"https://doi.org/10.1007/s00477-024-02789-x","url":null,"abstract":"<p>Slope reliability is of great importance in geotechnical engineering, and it is susceptible to various factors, such as slope cutting and rainfall. Currently, how the copula dependence structure affects the reliability of cutting slopes under rainfall conditions is still an open question. This study investigates the influence of copula dependence structure on the reliability analysis of a real slope, considering the slope cutting and rainfall characteristics (i.e., rainfall intensity, duration, and pattern). The Gaussian, Plackett, Frank, and No.16 copulas are first employed to model the joint probability distribution of the measured soil strength parameters. The optimal copula is subsequently identified using Akaike information criterion and Bayesian information criterion. The probability of failure (<i>P</i><sub><i>f</i></sub>) and the distribution of critical slip surface (CSS) for different slope cutting and rainfall conditions are then obtained within the framework of Monte Carlo simulation. The results show that the copula dependence between shear strengths has significant influence on the <i>P</i><sub><i>f</i></sub> for the cutting slope under rainfall conditions. The commonly used Gaussian copula may underestimate the <i>P</i><sub><i>f</i></sub>, while the No.16 copula would overestimate the <i>P</i><sub><i>f</i></sub> for different slope cutting angles and rainfall intensities, durations and patterns. The differences in <i>P</i><sub><i>f</i></sub> obtained by different copula functions decrease with the increase of cutting angle, cutting distance and rainfall intensity. Furthermore, the differences in <i>P</i><sub><i>f</i></sub> obtained by different copula functions show little variations with changes in rainfall duration and pattern. Although the copula function has a significant influence on the <i>P</i><sub><i>f</i></sub>, it has negligible influence on CSS. This study provides a practical tool for the selection of copula function and valuable insights for slope design and management under slope cutting and rainfall conditions.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"1 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1007/s00477-024-02762-8
Hidekazu Yoshioka, Yumi Yoshioka
{"title":"Correction to: Risk assessment of river water quality using long-memory processes subject to divergence or Wasserstein uncertainty","authors":"Hidekazu Yoshioka, Yumi Yoshioka","doi":"10.1007/s00477-024-02762-8","DOIUrl":"https://doi.org/10.1007/s00477-024-02762-8","url":null,"abstract":"","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"36 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1007/s00477-024-02791-3
Ruhui Cao, Yaxi Xiao, Yangbin Dong, Fuwang Zhang, Kai Shi, Zhanyong Wang
Regional air pollution represents a multifaceted and dynamic system, rendering linear statistical approaches insufficient for capturing its inherent variability, particularly the intricate fluctuations of multiple pollution indicators. Therefore, this study investigates the synergistic evolution mechanisms of PM2.5 and O3 in four cities within China’s Yangtze River Delta industrial base from 2013 to 2022, employing complex systems theory. Initially, the presence of multifractality and long-term persistence between PM2.5 and O3 is confirmed in each city using the multifractal detrended cross-correlation analysis. Quantitative indicators are then established to evaluate the synergistic control effects of PM2.5 and O3. Furthermore, factors influencing coordinated control are analyzed using the ensemble empirical mode decomposition. Finally, the self-organized criticality (SOC) theory is introduced to elucidate dynamic pollution patterns. The results indicate the following: (1) Multifractality and long-term persistence exist between PM2.5 and O3 in the four cities, with persistence strengthening alongside the implementation of atmospheric pollution prevention and control policies. The application of complex systems theory facilitates the explanation and quantification of the synergistic control effectiveness of PM2.5 and O3. (2) Since 2013, with the exception of Nanjing, the coordinated control effects of PM2.5 and O3 in Shanghai, Hangzhou, and Suzhou have been unsatisfactory and have shown little improvement. (3) Compared to short-term pollution emissions from human activities, annual atmospheric control measures, periodic meteorological variations, and long-range transport of regional pollutants exert a greater influence on the synergistic regulation effects of PM2.5 and O3. (4) SOC may serve as the primary mechanism influencing the effectiveness of the synergistic regulation of PM2.5 and O3. Sudden events, such as epidemic control measures, can disrupt the existing balance between PM2.5 and O3, thereby diminishing the coordinated control effects.
{"title":"Using complex systems theory to comprehend the coordinated control effects of PM2.5 and O3 in Yangtze River Delta industrial base in China","authors":"Ruhui Cao, Yaxi Xiao, Yangbin Dong, Fuwang Zhang, Kai Shi, Zhanyong Wang","doi":"10.1007/s00477-024-02791-3","DOIUrl":"https://doi.org/10.1007/s00477-024-02791-3","url":null,"abstract":"<p>Regional air pollution represents a multifaceted and dynamic system, rendering linear statistical approaches insufficient for capturing its inherent variability, particularly the intricate fluctuations of multiple pollution indicators. Therefore, this study investigates the synergistic evolution mechanisms of PM<sub>2.5</sub> and O<sub>3</sub> in four cities within China’s Yangtze River Delta industrial base from 2013 to 2022, employing complex systems theory. Initially, the presence of multifractality and long-term persistence between PM<sub>2.5</sub> and O<sub>3</sub> is confirmed in each city using the multifractal detrended cross-correlation analysis. Quantitative indicators are then established to evaluate the synergistic control effects of PM<sub>2.5</sub> and O<sub>3</sub>. Furthermore, factors influencing coordinated control are analyzed using the ensemble empirical mode decomposition. Finally, the self-organized criticality (SOC) theory is introduced to elucidate dynamic pollution patterns. The results indicate the following: (1) Multifractality and long-term persistence exist between PM<sub>2.5</sub> and O<sub>3</sub> in the four cities, with persistence strengthening alongside the implementation of atmospheric pollution prevention and control policies. The application of complex systems theory facilitates the explanation and quantification of the synergistic control effectiveness of PM<sub>2.5</sub> and O<sub>3</sub>. (2) Since 2013, with the exception of Nanjing, the coordinated control effects of PM<sub>2.5</sub> and O<sub>3</sub> in Shanghai, Hangzhou, and Suzhou have been unsatisfactory and have shown little improvement. (3) Compared to short-term pollution emissions from human activities, annual atmospheric control measures, periodic meteorological variations, and long-range transport of regional pollutants exert a greater influence on the synergistic regulation effects of PM<sub>2.5</sub> and O<sub>3</sub>. (4) SOC may serve as the primary mechanism influencing the effectiveness of the synergistic regulation of PM<sub>2.5</sub> and O<sub>3</sub>. Sudden events, such as epidemic control measures, can disrupt the existing balance between PM<sub>2.5</sub> and O<sub>3</sub>, thereby diminishing the coordinated control effects.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"12 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1007/s00477-024-02788-y
Mohammad Reza M. Behbahani, Maryam Mazarei, Amvrossios C. Bagtzoglou
Accurate machine learning streamflow prediction often requires coupling data-driven models with preprocessing techniques. This study aims to improve the performance of deep learning (DL) models, including long short-term memory, recurrent neural network (RNN), and gated recurrent unit (GRU) by incorporating maximal overlap discrete wavelet entropy transform (MODWET) techniques for streamflow forecasting. The merit of MODWET over maximal overlap discrete wavelet transform (MODWT) is that MODWET utilizes Entropy to determine the optimal decomposition level and suitable wavelet function, which was an unaddressed problem in wavelet-based decomposition models. Suitable decomposition level prevents providing unnecessary information or missing essential information. In this study we show that a unique decomposition level and wavelet filter is not suitable for any dataset. The research focuses on monthly streamflow data from three case studies in the CAMEL dataset in the USA. The accuracy of the models is evaluated using statistical measures such as Nash–Sutcliffe efficiency (NSE), root-mean-squared error, percent bias, and correlation coefficient (r). To determine the optimal model, a Taylor diagram is utilized. The results demonstrate the effectiveness of coupling MODWET with DL models in flood forecasting. Furthermore, genetic programming (GP) and partial correlation index (PCI) are employed for predictor selection. Hybrid models, namely MODWET-GP-GRU (NSE of 0.83), MODWET-GP-RNN (NSE of 0.95), and MODWET-PCI-GRU (NSE of 0.95), outperform simple DL models in terms of NSE and Taylor diagram evaluation. This study emphasizes the potential of hybrid models that combine DL algorithms with the recently proposed MODWET technique for streamflow prediction.
{"title":"Improving deep learning-based streamflow forecasting under trend varying conditions through evaluation of new wavelet preprocessing technique","authors":"Mohammad Reza M. Behbahani, Maryam Mazarei, Amvrossios C. Bagtzoglou","doi":"10.1007/s00477-024-02788-y","DOIUrl":"https://doi.org/10.1007/s00477-024-02788-y","url":null,"abstract":"<p>Accurate machine learning streamflow prediction often requires coupling data-driven models with preprocessing techniques. This study aims to improve the performance of deep learning (DL) models, including long short-term memory, recurrent neural network (RNN), and gated recurrent unit (GRU) by incorporating maximal overlap discrete wavelet entropy transform (MODWET) techniques for streamflow forecasting. The merit of MODWET over maximal overlap discrete wavelet transform (MODWT) is that MODWET utilizes Entropy to determine the optimal decomposition level and suitable wavelet function, which was an unaddressed problem in wavelet-based decomposition models. Suitable decomposition level prevents providing unnecessary information or missing essential information. In this study we show that a unique decomposition level and wavelet filter is not suitable for any dataset. The research focuses on monthly streamflow data from three case studies in the CAMEL dataset in the USA. The accuracy of the models is evaluated using statistical measures such as Nash–Sutcliffe efficiency (NSE), root-mean-squared error, percent bias, and correlation coefficient (r). To determine the optimal model, a Taylor diagram is utilized. The results demonstrate the effectiveness of coupling MODWET with DL models in flood forecasting. Furthermore, genetic programming (GP) and partial correlation index (PCI) are employed for predictor selection. Hybrid models, namely MODWET-GP-GRU (NSE of 0.83), MODWET-GP-RNN (NSE of 0.95), and MODWET-PCI-GRU (NSE of 0.95), outperform simple DL models in terms of NSE and Taylor diagram evaluation. This study emphasizes the potential of hybrid models that combine DL algorithms with the recently proposed MODWET technique for streamflow prediction.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"38 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1007/s00477-024-02769-1
M. N. M. van Lieshout
We study a Markov decision problem in which the state space is the set of finite marked point patterns in the plane, the actions represent thinnings, the reward is proportional to the mark sum which is discounted over time, and the transitions are governed by a birth-death-growth process. We show that thinning points with large marks maximises the discounted total expected reward when births follow a Poisson process and marks grow logistically. Explicit values for the thinning threshold and the discounted total expected reward over finite and infinite horizons are also provided. When the points are required to respect a hard core distance, upper and lower bounds on the discounted total expected reward are derived.
{"title":"Optimal decision rules for marked point process models","authors":"M. N. M. van Lieshout","doi":"10.1007/s00477-024-02769-1","DOIUrl":"https://doi.org/10.1007/s00477-024-02769-1","url":null,"abstract":"<p>We study a Markov decision problem in which the state space is the set of finite marked point patterns in the plane, the actions represent thinnings, the reward is proportional to the mark sum which is discounted over time, and the transitions are governed by a birth-death-growth process. We show that thinning points with large marks maximises the discounted total expected reward when births follow a Poisson process and marks grow logistically. Explicit values for the thinning threshold and the discounted total expected reward over finite and infinite horizons are also provided. When the points are required to respect a hard core distance, upper and lower bounds on the discounted total expected reward are derived.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"20 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a framework to aid in the selection of optimal environmental indicators for detecting and mapping extreme events and analyzing trends in heatwaves, meteorological and hydrological droughts, floods, and their compound occurrence. The framework uses temperature, precipitation, river discharge, and derived climate indices to characterize the spatial distribution of hazard intensity, frequency, duration, co-occurrence, and dependence. The relevant climate indices applied are Standardized Precipitation Index, Standardized Precipitation and Evapotranspiration Index (SPEI), Standardized Streamflow Index, heatwave indices based on fixed (HWI(_textrm{S})) and anomalous temperatures (HWI(_textrm{E})), and Daily Flood Index (DFI). We selected suitable environmental indicators and corresponding thresholds for each hazard based on estimated extreme event detection performance using receiver operating characteristics (ROC), area under curve (AUC), and accuracy, which is defined as the proportion of correct detections. We assessed compound hazard dependence using a Likelihood Multiplication Factor (LMF). We tested the framework for the case of Sweden, using daily data for the period 1922–2021. The ROC results showed that HWI(_textrm{S}), SPEI12 and DFI are suitable indices for representing heatwaves, droughts, and floods, respectively (AUC > 0.83). Application of these indices revealed increasing heatwave and flood occurrence in large areas of Sweden, but no significant change trend for droughts. Hotspots with LMF > 1, mostly concentrated in Northern Sweden from June to August, indicated that compound drought-heatwave and drought-flood events are positively correlated in those areas, which can exacerbate their impacts. The novel framework presented here adds to existing hydroclimatic hazard research by (1) using local data and historical records of extremes to validate indicator-based hazard hotspots, (2) evaluating compound hazards at regional scale, (3) being transferable and streamlined, (4) attaining satisfactory performance for indicator-based hazard detection as demonstrated by the ROC method, and (5) being generalizable to various hazard types.
{"title":"Identifying regional hotspots of heatwaves, droughts, floods, and their co-occurrences","authors":"Marlon Vieira Passos, Jung-Ching Kan, Georgia Destouni, Karina Barquet, Zahra Kalantari","doi":"10.1007/s00477-024-02783-3","DOIUrl":"https://doi.org/10.1007/s00477-024-02783-3","url":null,"abstract":"<p>In this paper we present a framework to aid in the selection of optimal environmental indicators for detecting and mapping extreme events and analyzing trends in heatwaves, meteorological and hydrological droughts, floods, and their compound occurrence. The framework uses temperature, precipitation, river discharge, and derived climate indices to characterize the spatial distribution of hazard intensity, frequency, duration, co-occurrence, and dependence. The relevant climate indices applied are Standardized Precipitation Index, Standardized Precipitation and Evapotranspiration Index (SPEI), Standardized Streamflow Index, heatwave indices based on fixed (HWI<span>(_textrm{S})</span>) and anomalous temperatures (HWI<span>(_textrm{E})</span>), and Daily Flood Index (DFI). We selected suitable environmental indicators and corresponding thresholds for each hazard based on estimated extreme event detection performance using receiver operating characteristics (ROC), area under curve (AUC), and accuracy, which is defined as the proportion of correct detections. We assessed compound hazard dependence using a Likelihood Multiplication Factor (LMF). We tested the framework for the case of Sweden, using daily data for the period 1922–2021. The ROC results showed that HWI<span>(_textrm{S})</span>, SPEI12 and DFI are suitable indices for representing heatwaves, droughts, and floods, respectively (AUC > 0.83). Application of these indices revealed increasing heatwave and flood occurrence in large areas of Sweden, but no significant change trend for droughts. Hotspots with LMF > 1, mostly concentrated in Northern Sweden from June to August, indicated that compound drought-heatwave and drought-flood events are positively correlated in those areas, which can exacerbate their impacts. The novel framework presented here adds to existing hydroclimatic hazard research by (1) using local data and historical records of extremes to validate indicator-based hazard hotspots, (2) evaluating compound hazards at regional scale, (3) being transferable and streamlined, (4) attaining satisfactory performance for indicator-based hazard detection as demonstrated by the ROC method, and (5) being generalizable to various hazard types.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"213 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1007/s00477-024-02786-0
Kamleshan Pillay, Mulala Danny Simatele
This study assesses the impact of climate change on flood frequency across seven sites in the Western Cape province of South Africa. The calibrated Water Resources Simulation Model (WRSM)/Pitman hydrological model was run using precipitation inputs from two representative concentration pathways (RCP) scenarios (RCP 4.5 and 8.5) using a combination of eight global circulatory models (GCM) for the two periods (2030–2060 and 2070–2100). GCMs were statistically downscaled using the delta change (DC), linear scaling (LS) and quantile delta mapping (QDM) approaches. Average daily discharge was estimated from each downscaled daily precipitation dataset using the Pitman/WRSM model with the Fuller and Sangal estimation methods used to calculate daily instantaneous peak flows. Flood frequency curves (FFC) were generated using the annual maximum series (AMS) for the GCM ensemble mean and individual GCMs for the return periods between 2 and 100 years. FFCs generated based on LS and QDM downscaling methods were aligned for the GCM ensemble mean in terms of the direction of FFCs. Further analysis was conducted using outputs based on the QDM approach, given its suitability in projecting peak flows. Under this method, both Fuller and Sangal FFCs exhibited a decreasing trend across the Jonkershoek and Little Berg River sites; however, estimated quantiles for low-probability events were higher under the Fuller method. This study noted the variation in FFCs from individual GCMs compared to the FFC representing the GCM ensemble mean. Further research on climate change flood frequency analysis (FFA) in South Africa should incorporate other advanced downscaling and instantaneous peak flow estimation (IPF) methods.
{"title":"Evaluating changes in flood frequency due to climate change in the Western Cape, South Africa","authors":"Kamleshan Pillay, Mulala Danny Simatele","doi":"10.1007/s00477-024-02786-0","DOIUrl":"https://doi.org/10.1007/s00477-024-02786-0","url":null,"abstract":"<p>This study assesses the impact of climate change on flood frequency across seven sites in the Western Cape province of South Africa. The calibrated Water Resources Simulation Model (WRSM)/Pitman hydrological model was run using precipitation inputs from two representative concentration pathways (RCP) scenarios (RCP 4.5 and 8.5) using a combination of eight global circulatory models (GCM) for the two periods (2030–2060 and 2070–2100). GCMs were statistically downscaled using the delta change (DC), linear scaling (LS) and quantile delta mapping (QDM) approaches. Average daily discharge was estimated from each downscaled daily precipitation dataset using the Pitman/WRSM model with the Fuller and Sangal estimation methods used to calculate daily instantaneous peak flows. Flood frequency curves (FFC) were generated using the annual maximum series (AMS) for the GCM ensemble mean and individual GCMs for the return periods between 2 and 100 years. FFCs generated based on LS and QDM downscaling methods were aligned for the GCM ensemble mean in terms of the direction of FFCs. Further analysis was conducted using outputs based on the QDM approach, given its suitability in projecting peak flows. Under this method, both Fuller and Sangal FFCs exhibited a decreasing trend across the Jonkershoek and Little Berg River sites; however, estimated quantiles for low-probability events were higher under the Fuller method. This study noted the variation in FFCs from individual GCMs compared to the FFC representing the GCM ensemble mean. Further research on climate change flood frequency analysis (FFA) in South Africa should incorporate other advanced downscaling and instantaneous peak flow estimation (IPF) methods.</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"15 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}