Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data

IF 8.2 Q1 AGRICULTURE, MULTIDISCIPLINARY Artificial Intelligence in Agriculture Pub Date : 2024-12-01 DOI:10.1016/j.aiia.2024.11.005
Mohammad Amin Razavi , A. Pouyan Nejadhashemi , Babak Majidi , Hoda S. Razavi , Josué Kpodo , Rasu Eeswaran , Ignacio Ciampitti , P.V. Vara Prasad
{"title":"Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data","authors":"Mohammad Amin Razavi ,&nbsp;A. Pouyan Nejadhashemi ,&nbsp;Babak Majidi ,&nbsp;Hoda S. Razavi ,&nbsp;Josué Kpodo ,&nbsp;Rasu Eeswaran ,&nbsp;Ignacio Ciampitti ,&nbsp;P.V. Vara Prasad","doi":"10.1016/j.aiia.2024.11.005","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we employ advanced data-driven techniques to investigate the complex relationships between the yields of five major crops and various geographical and spatiotemporal features in Senegal. We analyze how these features influence crop yields by utilizing remotely sensed data. Our methodology incorporates clustering algorithms and correlation matrix analysis to identify significant patterns and dependencies, offering a comprehensive understanding of the factors affecting agricultural productivity in Senegal. To optimize the model's performance and identify the optimal hyperparameters, we implemented a comprehensive grid search across four distinct machine learning regressors: Random Forest, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient-Boosting Machine (LightGBM). Each regressor offers unique functionalities, enhancing our exploration of potential model configurations. The top-performing models were selected based on evaluating multiple performance metrics, ensuring robust and accurate predictive capabilities. The results demonstrated that XGBoost and CatBoost perform better than the other two. We introduce synthetic crop data generated using a Variational Auto Encoder to address the challenges posed by limited agricultural datasets. By achieving high similarity scores with real-world data, our synthetic samples enhance model robustness, mitigate overfitting, and provide a viable solution for small dataset issues in agriculture. Our approach distinguishes itself by creating a flexible model applicable to various crops together. By integrating five crop datasets and generating high-quality synthetic data, we improve model performance, reduce overfitting, and enhance realism. Our findings provide crucial insights for productivity drivers in key cropping systems, enabling robust recommendations and strengthening the decision-making capabilities of policymakers and farmers in data-scarce regions.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"14 ","pages":"Pages 99-114"},"PeriodicalIF":8.2000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721724000448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we employ advanced data-driven techniques to investigate the complex relationships between the yields of five major crops and various geographical and spatiotemporal features in Senegal. We analyze how these features influence crop yields by utilizing remotely sensed data. Our methodology incorporates clustering algorithms and correlation matrix analysis to identify significant patterns and dependencies, offering a comprehensive understanding of the factors affecting agricultural productivity in Senegal. To optimize the model's performance and identify the optimal hyperparameters, we implemented a comprehensive grid search across four distinct machine learning regressors: Random Forest, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient-Boosting Machine (LightGBM). Each regressor offers unique functionalities, enhancing our exploration of potential model configurations. The top-performing models were selected based on evaluating multiple performance metrics, ensuring robust and accurate predictive capabilities. The results demonstrated that XGBoost and CatBoost perform better than the other two. We introduce synthetic crop data generated using a Variational Auto Encoder to address the challenges posed by limited agricultural datasets. By achieving high similarity scores with real-world data, our synthetic samples enhance model robustness, mitigate overfitting, and provide a viable solution for small dataset issues in agriculture. Our approach distinguishes itself by creating a flexible model applicable to various crops together. By integrating five crop datasets and generating high-quality synthetic data, we improve model performance, reduce overfitting, and enhance realism. Our findings provide crucial insights for productivity drivers in key cropping systems, enabling robust recommendations and strengthening the decision-making capabilities of policymakers and farmers in data-scarce regions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence in Agriculture
Artificial Intelligence in Agriculture Engineering-Engineering (miscellaneous)
CiteScore
21.60
自引率
0.00%
发文量
18
审稿时长
12 weeks
期刊最新文献
Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data Neural network architecture search enabled wide-deep learning (NAS-WD) for spatially heterogenous property awared chicken woody breast classification and hardness regression Utility-based regression and meta-learning techniques for modeling actual ET: Comparison to (METRIC-EEFLUX) model Detectability of multi-dimensional movement and behaviour in cattle using sensor data and machine learning algorithms: Study on a Charolais bull Estimating TYLCV resistance level using RGBD sensors in production greenhouse conditions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1