Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data

IF 8.2 Q1 AGRICULTURE, MULTIDISCIPLINARY Artificial Intelligence in Agriculture Pub Date : 2024-12-01 DOI:10.1016/j.aiia.2024.11.005

Mohammad Amin Razavi , A. Pouyan Nejadhashemi , Babak Majidi , Hoda S. Razavi , Josué Kpodo , Rasu Eeswaran , Ignacio Ciampitti , P.V. Vara Prasad

{"title":"Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data","authors":"Mohammad Amin Razavi , A. Pouyan Nejadhashemi , Babak Majidi , Hoda S. Razavi , Josué Kpodo , Rasu Eeswaran , Ignacio Ciampitti , P.V. Vara Prasad","doi":"10.1016/j.aiia.2024.11.005","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we employ advanced data-driven techniques to investigate the complex relationships between the yields of five major crops and various geographical and spatiotemporal features in Senegal. We analyze how these features influence crop yields by utilizing remotely sensed data. Our methodology incorporates clustering algorithms and correlation matrix analysis to identify significant patterns and dependencies, offering a comprehensive understanding of the factors affecting agricultural productivity in Senegal. To optimize the model's performance and identify the optimal hyperparameters, we implemented a comprehensive grid search across four distinct machine learning regressors: Random Forest, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient-Boosting Machine (LightGBM). Each regressor offers unique functionalities, enhancing our exploration of potential model configurations. The top-performing models were selected based on evaluating multiple performance metrics, ensuring robust and accurate predictive capabilities. The results demonstrated that XGBoost and CatBoost perform better than the other two. We introduce synthetic crop data generated using a Variational Auto Encoder to address the challenges posed by limited agricultural datasets. By achieving high similarity scores with real-world data, our synthetic samples enhance model robustness, mitigate overfitting, and provide a viable solution for small dataset issues in agriculture. Our approach distinguishes itself by creating a flexible model applicable to various crops together. By integrating five crop datasets and generating high-quality synthetic data, we improve model performance, reduce overfitting, and enhance realism. Our findings provide crucial insights for productivity drivers in key cropping systems, enabling robust recommendations and strengthening the decision-making capabilities of policymakers and farmers in data-scarce regions.</div></div>","PeriodicalId":52814,"journal":{"name":"Artificial Intelligence in Agriculture","volume":"14 ","pages":"Pages 99-114"},"PeriodicalIF":8.2000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Agriculture","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589721724000448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we employ advanced data-driven techniques to investigate the complex relationships between the yields of five major crops and various geographical and spatiotemporal features in Senegal. We analyze how these features influence crop yields by utilizing remotely sensed data. Our methodology incorporates clustering algorithms and correlation matrix analysis to identify significant patterns and dependencies, offering a comprehensive understanding of the factors affecting agricultural productivity in Senegal. To optimize the model's performance and identify the optimal hyperparameters, we implemented a comprehensive grid search across four distinct machine learning regressors: Random Forest, Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient-Boosting Machine (LightGBM). Each regressor offers unique functionalities, enhancing our exploration of potential model configurations. The top-performing models were selected based on evaluating multiple performance metrics, ensuring robust and accurate predictive capabilities. The results demonstrated that XGBoost and CatBoost perform better than the other two. We introduce synthetic crop data generated using a Variational Auto Encoder to address the challenges posed by limited agricultural datasets. By achieving high similarity scores with real-world data, our synthetic samples enhance model robustness, mitigate overfitting, and provide a viable solution for small dataset issues in agriculture. Our approach distinguishes itself by creating a flexible model applicable to various crops together. By integrating five crop datasets and generating high-quality synthetic data, we improve model performance, reduce overfitting, and enhance realism. Our findings provide crucial insights for productivity drivers in key cropping systems, enabling robust recommendations and strengthening the decision-making capabilities of policymakers and farmers in data-scarce regions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用先进的机器学习技术和合成数据加强塞内加尔的作物产量预测

在这项研究中，我们采用先进的数据驱动技术来研究塞内加尔五种主要作物的产量与各种地理和时空特征之间的复杂关系。我们利用遥感数据分析了这些特征如何影响作物产量。我们的方法结合了聚类算法和相关矩阵分析，以确定重要的模式和依赖关系，从而全面了解影响塞内加尔农业生产力的因素。为了优化模型的性能并识别最优超参数，我们在四个不同的机器学习回归量上实现了全面的网格搜索：随机森林、极端梯度增强（XGBoost）、分类增强（CatBoost）和光梯度增强机（LightGBM）。每个回归器提供了独特的功能，增强了我们对潜在模型配置的探索。在评估多个性能指标的基础上选择了性能最好的模型，确保了稳健和准确的预测能力。结果表明，XGBoost和CatBoost的性能优于其他两种。我们介绍了使用变分自动编码器生成的合成作物数据，以解决有限农业数据集带来的挑战。通过实现与真实世界数据的高相似性得分，我们的合成样本增强了模型的鲁棒性，减轻了过拟合，并为农业中的小数据集问题提供了可行的解决方案。我们的方法的独特之处在于创建了一个灵活的模型，可以同时适用于各种作物。通过整合五种作物数据集并生成高质量的合成数据，我们提高了模型性能，减少了过拟合，增强了真实感。我们的研究结果为关键种植系统的生产力驱动因素提供了重要见解，为数据稀缺地区的决策者和农民提供了强有力的建议，并加强了他们的决策能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊