{"title":"Knowledge informed hybrid machine learning in agricultural yield prediction","authors":"Malte von Bloh , David Lobell , Senthold Asseng","doi":"10.1016/j.compag.2024.109606","DOIUrl":null,"url":null,"abstract":"<div><div>Research on yield predictions is dominated by two approaches: machine learning and process-based models. Machine learning has shown impressive results in capturing complex relationships but is often limited by data availability in agriculture. Conversely, process-based models, with over 60 years of research history, simulate crop growth processes using biophysical equations. Here, we present a method to transfer domain knowledge from the Decision Support System for Agrotechnology Transfer framework (DSSAT) using the Nwheat crop simulation process-model into neural networks and random forest for predicting wheat yield at field scale. Expanding the feature and distribution space involved simulating crop parameters and synthetic samples through the utilization of observed and historical weather recordings, as well as future climate projections. We demonstrated that neural networks can learn both general crop growth and yield processes and then effectively adapt to regional, field-specific growth patterns using synthetic and high-resolution field data. This approach boosts overall performance and reduces model error by 8 % compared to a purely data-centric model without process-knowledge transfer and solely trained on observed field data and features. Synthetic samples generated from warmer conditions were the greatest driver for improvements and we showed that the climate scenario for data generation is more important than the actual synthetic data set size. The proposed method shows the potential of combining process-based and machine-learning models, highlighting the potential to leverage the strengths of both methods in a collaborative manner.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"227 ","pages":"Article 109606"},"PeriodicalIF":7.7000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169924009979","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Research on yield predictions is dominated by two approaches: machine learning and process-based models. Machine learning has shown impressive results in capturing complex relationships but is often limited by data availability in agriculture. Conversely, process-based models, with over 60 years of research history, simulate crop growth processes using biophysical equations. Here, we present a method to transfer domain knowledge from the Decision Support System for Agrotechnology Transfer framework (DSSAT) using the Nwheat crop simulation process-model into neural networks and random forest for predicting wheat yield at field scale. Expanding the feature and distribution space involved simulating crop parameters and synthetic samples through the utilization of observed and historical weather recordings, as well as future climate projections. We demonstrated that neural networks can learn both general crop growth and yield processes and then effectively adapt to regional, field-specific growth patterns using synthetic and high-resolution field data. This approach boosts overall performance and reduces model error by 8 % compared to a purely data-centric model without process-knowledge transfer and solely trained on observed field data and features. Synthetic samples generated from warmer conditions were the greatest driver for improvements and we showed that the climate scenario for data generation is more important than the actual synthetic data set size. The proposed method shows the potential of combining process-based and machine-learning models, highlighting the potential to leverage the strengths of both methods in a collaborative manner.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.