Optimizing dataset diversity for a robust deep-learning model in rice blast disease identification to enhance crop health assessment across diverse conditions
{"title":"Optimizing dataset diversity for a robust deep-learning model in rice blast disease identification to enhance crop health assessment across diverse conditions","authors":"Reuben Alfred , Judith Leo , Shubi Felix Kaijage","doi":"10.1016/j.atech.2024.100726","DOIUrl":null,"url":null,"abstract":"<div><div><em>Magnaporthe oryzae,</em> the pathogen that causes rice blast disease, poses a significant global threat to rice production. This disease may lead to yield losses exceeding 30 % in susceptible rice varieties. There is an urgent need for more effective detection solutions, as traditional methods—primarily based on visual inspection—are time-consuming and prone to errors. Deep-learning models presented effective solutions for disease identification due to their ability to analyze large datasets. However, the diversity of the training dataset is significant for optimal performance and generalizability of the model. This study evaluated the impact of dataset diversity on model performance and generalizability by developing two models, referred to in this study as the <em>High-Diverse Model</em> and the <em>Low-Diverse Model</em>. The <em>High-Diverse Model</em> was trained on a diverse dataset comprising images from different geographical regions, rice species, environmental conditions, plant growth stages, and disease severity levels. In contrast, the Low-Diverse Model was trained on a less diverse dataset with significantly limited variability. The results showed that the High-Diverse Model significantly outperformed the Low-Diverse Model, achieving a training accuracy of 95.26 % and a validation accuracy of 94.43 %, indicating effective generalization. The Low-Diverse Model achieved an accuracy of 98.37 % on the training data but only 35.38 % on the validation data, indicating a severe overfitting issue associated with limited dataset diversity<em>.</em> This highlights the importance of dataset diversity in developing effective and scalable deep-learning models for crop health assessment.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"10 ","pages":"Article 100726"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375524003307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Magnaporthe oryzae, the pathogen that causes rice blast disease, poses a significant global threat to rice production. This disease may lead to yield losses exceeding 30 % in susceptible rice varieties. There is an urgent need for more effective detection solutions, as traditional methods—primarily based on visual inspection—are time-consuming and prone to errors. Deep-learning models presented effective solutions for disease identification due to their ability to analyze large datasets. However, the diversity of the training dataset is significant for optimal performance and generalizability of the model. This study evaluated the impact of dataset diversity on model performance and generalizability by developing two models, referred to in this study as the High-Diverse Model and the Low-Diverse Model. The High-Diverse Model was trained on a diverse dataset comprising images from different geographical regions, rice species, environmental conditions, plant growth stages, and disease severity levels. In contrast, the Low-Diverse Model was trained on a less diverse dataset with significantly limited variability. The results showed that the High-Diverse Model significantly outperformed the Low-Diverse Model, achieving a training accuracy of 95.26 % and a validation accuracy of 94.43 %, indicating effective generalization. The Low-Diverse Model achieved an accuracy of 98.37 % on the training data but only 35.38 % on the validation data, indicating a severe overfitting issue associated with limited dataset diversity. This highlights the importance of dataset diversity in developing effective and scalable deep-learning models for crop health assessment.