Keltoum Khechba , Mariana Belgiu , Ahmed Laamrani , Alfred Stein , Abdelhakim Amazirh , Abdelghani Chehbouni
{"title":"基于遥感数据和机器学习的环境条件时空变异性对小麦产量预测的影响","authors":"Keltoum Khechba , Mariana Belgiu , Ahmed Laamrani , Alfred Stein , Abdelhakim Amazirh , Abdelghani Chehbouni","doi":"10.1016/j.jag.2025.104367","DOIUrl":null,"url":null,"abstract":"<div><div>Climate change poses significant challenges to food security, especially in semi-arid agriculture areas. Effective monitoring of crop yield is important for establishing food emergency responses and developing long-term sustainable strategies. In Morocco, where cereals are the predominant crops, yield forecasting is important for addressing the yield gap as it enables farmers to take preventive actions before the harvesting period. This study aims to assess the impact of spatial and temporal heterogeneity of environmental conditions on wheat yield forecasting using machine learning models. It compares the 2019–2020 and 2020–2021 agricultural seasons using three sets of variables: (1) spectral indices; (2) weather data; and (3) a combination of both spectral indices and weather data. Weather data, including cumulative monthly precipitation from ERA5 data and average monthly temperature from PERSIANN data, were extracted for the wheat growing season (November to June). Spectral indices including the Normalized Difference Vegetation Index, Moisture Stress Index, and Terrestrial Chlorophyll Index were calculated from Sentinel-2 imagery for the same period and processed using Google Earth Engine. The study area was divided into homogeneous zones based on an existing landform classification, and XGBoost and Random Forest (RF) models were used for yield forecasting in each zone separately. The two models performed equally well across both the zones and the whole study area (SA) when using weather data as the input variable. For instance, across SA, they achieved average R<sup>2</sup> values of 0.60 and 0.81 for all months during the 2019–2020 and 2020–2021 agricultural seasons, respectively. However, when using spectral indices or combining these indices with weather data, RF consistently outperformed XGBoost. For example, in SA during the 2019–2020 season, RF achieved an average R<sup>2</sup> of 0.48 across the growing season, compared to XGBoost’s R<sup>2</sup> of 0.43. Similarly, in the 2020–2021 season, RF achieved an R<sup>2</sup> of 0.35 and an RMSE of 1083 kg ha<sup>-1</sup>, while XGBoost performed slightly lower, with an R<sup>2</sup> of 0.29 and an RMSE of 1137 kg ha<sup>-1</sup>. Comparing the prediction accuracy between the seasons for each set of variables, the RF model performs better when using spectral indices during the relatively dry 2019–2020 season as compared to the wet 2020–2021 season. Incorporating weather data, the model improved its performance for the 2020–2021 season. April showed the highest prediction performance overall, with R<sup>2</sup> values of 0.6 for SA using weather data alone in the 2019–2020 season, and 0.8 for SA using a combination of weather data and spectral indices in the 2020–2021 season. The 2019–2020 season showed strong fluctuations in accuracy throughout the growing season, whereas the 2020–2021 season had a consistent improvement in accuracy over time. These variations in accuracy are due to differing environmental conditions that should be taken into account for making better and more reliable yield predictions.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"136 ","pages":"Article 104367"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The impact of spatiotemporal variability of environmental conditions on wheat yield forecasting using remote sensing data and machine learning\",\"authors\":\"Keltoum Khechba , Mariana Belgiu , Ahmed Laamrani , Alfred Stein , Abdelhakim Amazirh , Abdelghani Chehbouni\",\"doi\":\"10.1016/j.jag.2025.104367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Climate change poses significant challenges to food security, especially in semi-arid agriculture areas. Effective monitoring of crop yield is important for establishing food emergency responses and developing long-term sustainable strategies. In Morocco, where cereals are the predominant crops, yield forecasting is important for addressing the yield gap as it enables farmers to take preventive actions before the harvesting period. This study aims to assess the impact of spatial and temporal heterogeneity of environmental conditions on wheat yield forecasting using machine learning models. It compares the 2019–2020 and 2020–2021 agricultural seasons using three sets of variables: (1) spectral indices; (2) weather data; and (3) a combination of both spectral indices and weather data. Weather data, including cumulative monthly precipitation from ERA5 data and average monthly temperature from PERSIANN data, were extracted for the wheat growing season (November to June). Spectral indices including the Normalized Difference Vegetation Index, Moisture Stress Index, and Terrestrial Chlorophyll Index were calculated from Sentinel-2 imagery for the same period and processed using Google Earth Engine. The study area was divided into homogeneous zones based on an existing landform classification, and XGBoost and Random Forest (RF) models were used for yield forecasting in each zone separately. The two models performed equally well across both the zones and the whole study area (SA) when using weather data as the input variable. For instance, across SA, they achieved average R<sup>2</sup> values of 0.60 and 0.81 for all months during the 2019–2020 and 2020–2021 agricultural seasons, respectively. However, when using spectral indices or combining these indices with weather data, RF consistently outperformed XGBoost. For example, in SA during the 2019–2020 season, RF achieved an average R<sup>2</sup> of 0.48 across the growing season, compared to XGBoost’s R<sup>2</sup> of 0.43. Similarly, in the 2020–2021 season, RF achieved an R<sup>2</sup> of 0.35 and an RMSE of 1083 kg ha<sup>-1</sup>, while XGBoost performed slightly lower, with an R<sup>2</sup> of 0.29 and an RMSE of 1137 kg ha<sup>-1</sup>. Comparing the prediction accuracy between the seasons for each set of variables, the RF model performs better when using spectral indices during the relatively dry 2019–2020 season as compared to the wet 2020–2021 season. Incorporating weather data, the model improved its performance for the 2020–2021 season. April showed the highest prediction performance overall, with R<sup>2</sup> values of 0.6 for SA using weather data alone in the 2019–2020 season, and 0.8 for SA using a combination of weather data and spectral indices in the 2020–2021 season. The 2019–2020 season showed strong fluctuations in accuracy throughout the growing season, whereas the 2020–2021 season had a consistent improvement in accuracy over time. These variations in accuracy are due to differing environmental conditions that should be taken into account for making better and more reliable yield predictions.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"136 \",\"pages\":\"Article 104367\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225000147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225000147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
摘要
气候变化对粮食安全构成重大挑战,特别是在半干旱农业区。有效监测作物产量对于制定粮食应急措施和制定长期可持续战略至关重要。在以谷物为主要作物的摩洛哥,产量预测对于解决产量差距非常重要,因为它使农民能够在收获期之前采取预防行动。本研究旨在利用机器学习模型评估环境条件时空异质性对小麦产量预测的影响。利用三组变量对2019-2020年和2020-2021年的农业季节进行比较:(1)光谱指数;(2)气象资料;(3)光谱指数与气象资料相结合。提取了小麦生长季节(11月至6月)的气象数据,包括ERA5数据的月累积降水量和persann数据的月平均气温。利用同一时期的Sentinel-2遥感影像计算归一化植被指数、水分胁迫指数和陆地叶绿素指数,并使用谷歌Earth Engine进行处理。在现有地貌分类的基础上,将研究区划分为均匀带,分别使用XGBoost和Random Forest (RF)模型对每个带进行产量预测。当使用天气数据作为输入变量时,这两个模型在两个区域和整个研究区域(SA)上都表现得同样好。例如,在整个SA中,2019-2020年和2020-2021年农业季节所有月份的平均R2分别为0.60和0.81。然而,当使用光谱指数或将这些指数与天气数据相结合时,RF的表现始终优于XGBoost。例如,在2019-2020季节,在SA中,RF在整个生长季节的平均R2为0.48,而XGBoost的R2为0.43。同样,在2020-2021赛季,RF的R2为0.35,RMSE为1083 kg ha-1,而XGBoost的R2略低,为0.29,RMSE为1137 kg ha-1。对比各变量季节间的预测精度,RF模型在相对干旱的2019-2020季节比湿润的2020-2021季节表现更好。结合天气数据,该模型提高了其在2020-2021赛季的表现。总体而言,4月份的预测性能最高,2019-2020年季节仅使用天气数据的SA R2值为0.6,2020-2021年季节使用天气数据和光谱指数组合的SA R2值为0.8。2019-2020赛季在整个生长季中准确性波动较大,而2020-2021赛季随着时间的推移准确性持续提高。这些准确性的差异是由于不同的环境条件造成的,为了做出更好和更可靠的产量预测,应将这些环境条件考虑在内。
The impact of spatiotemporal variability of environmental conditions on wheat yield forecasting using remote sensing data and machine learning
Climate change poses significant challenges to food security, especially in semi-arid agriculture areas. Effective monitoring of crop yield is important for establishing food emergency responses and developing long-term sustainable strategies. In Morocco, where cereals are the predominant crops, yield forecasting is important for addressing the yield gap as it enables farmers to take preventive actions before the harvesting period. This study aims to assess the impact of spatial and temporal heterogeneity of environmental conditions on wheat yield forecasting using machine learning models. It compares the 2019–2020 and 2020–2021 agricultural seasons using three sets of variables: (1) spectral indices; (2) weather data; and (3) a combination of both spectral indices and weather data. Weather data, including cumulative monthly precipitation from ERA5 data and average monthly temperature from PERSIANN data, were extracted for the wheat growing season (November to June). Spectral indices including the Normalized Difference Vegetation Index, Moisture Stress Index, and Terrestrial Chlorophyll Index were calculated from Sentinel-2 imagery for the same period and processed using Google Earth Engine. The study area was divided into homogeneous zones based on an existing landform classification, and XGBoost and Random Forest (RF) models were used for yield forecasting in each zone separately. The two models performed equally well across both the zones and the whole study area (SA) when using weather data as the input variable. For instance, across SA, they achieved average R2 values of 0.60 and 0.81 for all months during the 2019–2020 and 2020–2021 agricultural seasons, respectively. However, when using spectral indices or combining these indices with weather data, RF consistently outperformed XGBoost. For example, in SA during the 2019–2020 season, RF achieved an average R2 of 0.48 across the growing season, compared to XGBoost’s R2 of 0.43. Similarly, in the 2020–2021 season, RF achieved an R2 of 0.35 and an RMSE of 1083 kg ha-1, while XGBoost performed slightly lower, with an R2 of 0.29 and an RMSE of 1137 kg ha-1. Comparing the prediction accuracy between the seasons for each set of variables, the RF model performs better when using spectral indices during the relatively dry 2019–2020 season as compared to the wet 2020–2021 season. Incorporating weather data, the model improved its performance for the 2020–2021 season. April showed the highest prediction performance overall, with R2 values of 0.6 for SA using weather data alone in the 2019–2020 season, and 0.8 for SA using a combination of weather data and spectral indices in the 2020–2021 season. The 2019–2020 season showed strong fluctuations in accuracy throughout the growing season, whereas the 2020–2021 season had a consistent improvement in accuracy over time. These variations in accuracy are due to differing environmental conditions that should be taken into account for making better and more reliable yield predictions.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.