Chouaib El Hachimi, Salwa Belaqziz, Saïd Khabba, Bouchra Ait Hssaine, Mohamed Hakim Kharrou, Abdelghani Chehbouni
{"title":"Advancements in weather forecasting for precision agriculture: From statistical modeling to transformer-based architectures","authors":"Chouaib El Hachimi, Salwa Belaqziz, Saïd Khabba, Bouchra Ait Hssaine, Mohamed Hakim Kharrou, Abdelghani Chehbouni","doi":"10.1007/s00477-024-02778-0","DOIUrl":null,"url":null,"abstract":"<p>As precision agriculture (PA) advances, the demand for accurate and high-resolution weather forecasts becomes critical for optimizing agricultural management practices. Despite improvements in Numerical Weather Prediction (NWP) models, they lack the granularity and efficiency needed for PA. Data-driven models offer a promising alternative by integrating predictive capabilities closer to IoT edge data sources, but their efficacy requires evaluation. Here, this paper evaluates six models from three data-driven eras (statistical, machine learning, and deep learning) using agrometeorological data from an Automatic Weather Station (AWS) in Sidi Rahal, East Marrakech, central Morocco, covering 2013–2020 at half-hour intervals, including air temperature, solar radiation, and relative humidity. First, the data is quality-controlled through imputation using ERA5-Land. Then, the dataset was split into training (2013–2019) and evaluation (2020) sets, with validation horizons of 1 day, 3 days, and 1 week. Statistical models generally perform well in air temperature forecasting, occasionally surpassing other models. However, the Temporal Convolutional Neural Network (TCNN) consistently demonstrates superior performance for challenging variables, balancing low RMSE and high R<sup>2</sup> across various horizons, with some exceptions. Specifically, for relative humidity, the linear regression model achieves slightly lower RMSE (3,96% and 6,05%) compared to TCNN (4,00% and 6,79%) for 1 day and 3 days, respectively. Additionally, CatBoost outperforms TCNN for 1-week forecasts. In terms of training time, the Transformer requires the longest, followed by AutoARIMA and CatBoost. Uncertainty analysis of stochastic models using solar radiation showed the stable performance of TCNN with 0,80 and 0,01 for the RMSE and R<sup>2</sup> standard deviations, respectively. Considering the trade-off between performance, training time, and capturing complex relationships, TCNN emerges as the optimal choice. ANOVA, Tukey’s HSD and Mann-Whitney U statistical tests also confirmed TCNN’s performance. Finally, a comparison with the Global Forecast System (GFS) reveals TCNN’s clear superiority in all metrics, particularly evident for the RMSE of 3 days air temperature forecasts (TCNN: 1,96 °C, GFS: 3,59 °C).</p>","PeriodicalId":21987,"journal":{"name":"Stochastic Environmental Research and Risk Assessment","volume":"59 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Environmental Research and Risk Assessment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1007/s00477-024-02778-0","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
As precision agriculture (PA) advances, the demand for accurate and high-resolution weather forecasts becomes critical for optimizing agricultural management practices. Despite improvements in Numerical Weather Prediction (NWP) models, they lack the granularity and efficiency needed for PA. Data-driven models offer a promising alternative by integrating predictive capabilities closer to IoT edge data sources, but their efficacy requires evaluation. Here, this paper evaluates six models from three data-driven eras (statistical, machine learning, and deep learning) using agrometeorological data from an Automatic Weather Station (AWS) in Sidi Rahal, East Marrakech, central Morocco, covering 2013–2020 at half-hour intervals, including air temperature, solar radiation, and relative humidity. First, the data is quality-controlled through imputation using ERA5-Land. Then, the dataset was split into training (2013–2019) and evaluation (2020) sets, with validation horizons of 1 day, 3 days, and 1 week. Statistical models generally perform well in air temperature forecasting, occasionally surpassing other models. However, the Temporal Convolutional Neural Network (TCNN) consistently demonstrates superior performance for challenging variables, balancing low RMSE and high R2 across various horizons, with some exceptions. Specifically, for relative humidity, the linear regression model achieves slightly lower RMSE (3,96% and 6,05%) compared to TCNN (4,00% and 6,79%) for 1 day and 3 days, respectively. Additionally, CatBoost outperforms TCNN for 1-week forecasts. In terms of training time, the Transformer requires the longest, followed by AutoARIMA and CatBoost. Uncertainty analysis of stochastic models using solar radiation showed the stable performance of TCNN with 0,80 and 0,01 for the RMSE and R2 standard deviations, respectively. Considering the trade-off between performance, training time, and capturing complex relationships, TCNN emerges as the optimal choice. ANOVA, Tukey’s HSD and Mann-Whitney U statistical tests also confirmed TCNN’s performance. Finally, a comparison with the Global Forecast System (GFS) reveals TCNN’s clear superiority in all metrics, particularly evident for the RMSE of 3 days air temperature forecasts (TCNN: 1,96 °C, GFS: 3,59 °C).
期刊介绍:
Stochastic Environmental Research and Risk Assessment (SERRA) will publish research papers, reviews and technical notes on stochastic and probabilistic approaches to environmental sciences and engineering, including interactions of earth and atmospheric environments with people and ecosystems. The basic idea is to bring together research papers on stochastic modelling in various fields of environmental sciences and to provide an interdisciplinary forum for the exchange of ideas, for communicating on issues that cut across disciplinary barriers, and for the dissemination of stochastic techniques used in different fields to the community of interested researchers. Original contributions will be considered dealing with modelling (theoretical and computational), measurements and instrumentation in one or more of the following topical areas:
- Spatiotemporal analysis and mapping of natural processes.
- Enviroinformatics.
- Environmental risk assessment, reliability analysis and decision making.
- Surface and subsurface hydrology and hydraulics.
- Multiphase porous media domains and contaminant transport modelling.
- Hazardous waste site characterization.
- Stochastic turbulence and random hydrodynamic fields.
- Chaotic and fractal systems.
- Random waves and seafloor morphology.
- Stochastic atmospheric and climate processes.
- Air pollution and quality assessment research.
- Modern geostatistics.
- Mechanisms of pollutant formation, emission, exposure and absorption.
- Physical, chemical and biological analysis of human exposure from single and multiple media and routes; control and protection.
- Bioinformatics.
- Probabilistic methods in ecology and population biology.
- Epidemiological investigations.
- Models using stochastic differential equations stochastic or partial differential equations.
- Hazardous waste site characterization.