Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis and remote sensing data

IF 6.1 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY Agricultural Systems Pub Date : 2024-08-29 DOI:10.1016/j.agsy.2024.104099

Djavan De Clercq, Adam Mahdi

{"title":"Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis and remote sensing data","authors":"Djavan De Clercq, Adam Mahdi","doi":"10.1016/j.agsy.2024.104099","DOIUrl":null,"url":null,"abstract":"<div><h3>CONTEXT</h3><p>Yield forecasting, the science of predicting agricultural productivity before the crop harvest occurs, helps a wide range of stakeholders make better decisions around agricultural planning.</p></div><div><h3>OBJECTIVE</h3><p>This study aims to investigate whether machine learning-based yield prediction models can capably predict Kharif season rice yields at the district level in India several months before the rice harvest takes place.</p></div><div><h3>METHODOLOGY</h3><p>The methodology involved training 19 machine learning models such as CatBoost, LightGBM, Orthogonal Matching Pursuit, and Extremely Randomized Trees on 20 years of climate, satellite, and rice yield data across 247 of India's rice-producing districts. In addition to model-building, a dynamic dashboard was built understand how the reliability of rice yield predictions varies across district.</p></div><div><h3>RESULTS AND CONCLUSIONS</h3><p>The results of the proof-of-concept machine learning pipeline demonstrated that rice yields can be predicted with a reasonable degree of accuracy, with out-of-sample R2, MAE, and MAPE performance of up to 0.82, 0.29, and 0.16 respectively. This performance outperformed test set performance reported in related literature on rice yield modelling in other contexts and countries. In addition, SHAP value analysis was conducted to infer both the importance and directional impact of the climate and remote sensing variables included in the model. Important features driving rice yields included temperature, soil water volume, and leaf area index. In particular, higher temperatures in August correlate with increased rice yields, particularly when the leaf area index in August is also high. Building on the results, a proof-of-concept dashboard was developed to allow users to easily explore which districts may experience a rise or fall in yield relative to the previous year. The dashboard show that the model may perform better in some regions than in others. For instance, the absolute percentage error for predicted versus actual yields ranged from an average of 7.1 % in districts in Uttarakhand to an average of 14.7 % in Uttar Pradesh.</p></div><div><h3>SIGNIFICANCE</h3><p>This study underscores the potential for policymakers to consider scaling and operationalizing machine learning approaches to rice yield prediction in the context of agricultural early warning systems to deliver timely crop yield forecasts on a rolling basis throughout the season, thereby equipping agricultural decision-makers with the ability to make informed choices on irrigation scheduling, fertilizer application, and harvest planning to optimize crop output and resource use.</p></div>","PeriodicalId":7730,"journal":{"name":"Agricultural Systems","volume":"220 ","pages":"Article 104099"},"PeriodicalIF":6.1000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural Systems","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0308521X2400249X","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

CONTEXT

Yield forecasting, the science of predicting agricultural productivity before the crop harvest occurs, helps a wide range of stakeholders make better decisions around agricultural planning.

OBJECTIVE

This study aims to investigate whether machine learning-based yield prediction models can capably predict Kharif season rice yields at the district level in India several months before the rice harvest takes place.

METHODOLOGY

The methodology involved training 19 machine learning models such as CatBoost, LightGBM, Orthogonal Matching Pursuit, and Extremely Randomized Trees on 20 years of climate, satellite, and rice yield data across 247 of India's rice-producing districts. In addition to model-building, a dynamic dashboard was built understand how the reliability of rice yield predictions varies across district.

RESULTS AND CONCLUSIONS

The results of the proof-of-concept machine learning pipeline demonstrated that rice yields can be predicted with a reasonable degree of accuracy, with out-of-sample R2, MAE, and MAPE performance of up to 0.82, 0.29, and 0.16 respectively. This performance outperformed test set performance reported in related literature on rice yield modelling in other contexts and countries. In addition, SHAP value analysis was conducted to infer both the importance and directional impact of the climate and remote sensing variables included in the model. Important features driving rice yields included temperature, soil water volume, and leaf area index. In particular, higher temperatures in August correlate with increased rice yields, particularly when the leaf area index in August is also high. Building on the results, a proof-of-concept dashboard was developed to allow users to easily explore which districts may experience a rise or fall in yield relative to the previous year. The dashboard show that the model may perform better in some regions than in others. For instance, the absolute percentage error for predicted versus actual yields ranged from an average of 7.1 % in districts in Uttarakhand to an average of 14.7 % in Uttar Pradesh.

SIGNIFICANCE

This study underscores the potential for policymakers to consider scaling and operationalizing machine learning approaches to rice yield prediction in the context of agricultural early warning systems to deliver timely crop yield forecasts on a rolling basis throughout the season, thereby equipping agricultural decision-makers with the ability to make informed choices on irrigation scheduling, fertilizer application, and harvest planning to optimize crop output and resource use.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用气候再分析和遥感数据，在印度地区一级进行基于机器学习的水稻产量预测的可行性

本研究旨在调查基于机器学习的产量预测模型能否在水稻收获前几个月预测印度地区级的 Kharif 季节水稻产量。研究方法包括在印度 247 个水稻产区 20 年的气候、卫星和水稻产量数据上训练 19 种机器学习模型，如 CatBoost、LightGBM、Orthogonal Matching Pursuit 和 Extremely Randomized Trees。结果和结论概念验证机器学习管道的结果表明，水稻产量的预测具有相当高的准确性，样本外 R2、MAE 和 MAPE 分别高达 0.82、0.29 和 0.16。这一性能优于其他国家和地区水稻产量建模相关文献中报告的测试集性能。此外，还进行了 SHAP 值分析，以推断模型中包含的气候和遥感变量的重要性和方向性影响。影响水稻产量的重要因素包括温度、土壤水量和叶面积指数。其中，八月份较高的温度与水稻产量的增加相关，尤其是当八月份叶面积指数也较高时。在这些结果的基础上，我们开发了一个概念验证仪表板，让用户可以轻松探索哪些地区的产量相对于上一年可能会上升或下降。仪表板显示，该模型在某些地区的表现可能优于其他地区。例如，预测产量与实际产量之间的绝对百分比误差从北阿坎德邦各县平均 7.1% 到北方邦平均 14.7%不等。意义本研究强调，决策者有可能考虑在农业预警系统的背景下推广和应用水稻产量预测的机器学习方法，以便在整个季节滚动提供及时的作物产量预测，从而使农业决策者有能力在灌溉调度、肥料施用和收获规划方面做出明智的选择，以优化作物产量和资源利用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Agricultural Systems 农林科学-农业综合

CiteScore

13.30

自引率

7.60%

发文量

174

审稿时长

30 days

期刊介绍： Agricultural Systems is an international journal that deals with interactions - among the components of agricultural systems, among hierarchical levels of agricultural systems, between agricultural and other land use systems, and between agricultural systems and their natural, social and economic environments. The scope includes the development and application of systems analysis methodologies in the following areas: Systems approaches in the sustainable intensification of agriculture; pathways for sustainable intensification; crop-livestock integration; farm-level resource allocation; quantification of benefits and trade-offs at farm to landscape levels; integrative, participatory and dynamic modelling approaches for qualitative and quantitative assessments of agricultural systems and decision making; The interactions between agricultural and non-agricultural landscapes; the multiple services of agricultural systems; food security and the environment; Global change and adaptation science; transformational adaptations as driven by changes in climate, policy, values and attitudes influencing the design of farming systems; Development and application of farming systems design tools and methods for impact, scenario and case study analysis; managing the complexities of dynamic agricultural systems; innovation systems and multi stakeholder arrangements that support or promote change and (or) inform policy decisions.