Optimizing Strawberry Disease and Quality Detection with Vision Transformers and Attention-Based Convolutional Neural Networks

Foods Pub Date : 2024-06-14 DOI:10.3390/foods13121869

Kimia Aghamohammadesmaeilketabforoosh, Soodeh Nikan, Giorgio Antonini, Joshua M. Pearce

{"title":"Optimizing Strawberry Disease and Quality Detection with Vision Transformers and Attention-Based Convolutional Neural Networks","authors":"Kimia Aghamohammadesmaeilketabforoosh, Soodeh Nikan, Giorgio Antonini, Joshua M. Pearce","doi":"10.3390/foods13121869","DOIUrl":null,"url":null,"abstract":"Machine learning and computer vision have proven to be valuable tools for farmers to streamline their resource utilization to lead to more sustainable and efficient agricultural production. These techniques have been applied to strawberry cultivation in the past with limited success. To build on this past work, in this study, two separate sets of strawberry images, along with their associated diseases, were collected and subjected to resizing and augmentation. Subsequently, a combined dataset consisting of nine classes was utilized to fine-tune three distinct pretrained models: vision transformer (ViT), MobileNetV2, and ResNet18. To address the imbalanced class distribution in the dataset, each class was assigned weights to ensure nearly equal impact during the training process. To enhance the outcomes, new images were generated by removing backgrounds, reducing noise, and flipping them. The performances of ViT, MobileNetV2, and ResNet18 were compared after being selected. Customization specific to the task was applied to all three algorithms, and their performances were assessed. Throughout this experiment, none of the layers were frozen, ensuring all layers remained active during training. Attention heads were incorporated into the first five and last five layers of MobileNetV2 and ResNet18, while the architecture of ViT was modified. The results indicated accuracy factors of 98.4%, 98.1%, and 97.9% for ViT, MobileNetV2, and ResNet18, respectively. Despite the data being imbalanced, the precision, which indicates the proportion of correctly identified positive instances among all predicted positive instances, approached nearly 99% with the ViT. MobileNetV2 and ResNet18 demonstrated similar results. Overall, the analysis revealed that the vision transformer model exhibited superior performance in strawberry ripeness and disease classification. The inclusion of attention heads in the early layers of ResNet18 and MobileNet18, along with the inherent attention mechanism in ViT, improved the accuracy of image identification. These findings offer the potential for farmers to enhance strawberry cultivation through passive camera monitoring alone, promoting the health and well-being of the population.","PeriodicalId":502667,"journal":{"name":"Foods","volume":"21 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/foods13121869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning and computer vision have proven to be valuable tools for farmers to streamline their resource utilization to lead to more sustainable and efficient agricultural production. These techniques have been applied to strawberry cultivation in the past with limited success. To build on this past work, in this study, two separate sets of strawberry images, along with their associated diseases, were collected and subjected to resizing and augmentation. Subsequently, a combined dataset consisting of nine classes was utilized to fine-tune three distinct pretrained models: vision transformer (ViT), MobileNetV2, and ResNet18. To address the imbalanced class distribution in the dataset, each class was assigned weights to ensure nearly equal impact during the training process. To enhance the outcomes, new images were generated by removing backgrounds, reducing noise, and flipping them. The performances of ViT, MobileNetV2, and ResNet18 were compared after being selected. Customization specific to the task was applied to all three algorithms, and their performances were assessed. Throughout this experiment, none of the layers were frozen, ensuring all layers remained active during training. Attention heads were incorporated into the first five and last five layers of MobileNetV2 and ResNet18, while the architecture of ViT was modified. The results indicated accuracy factors of 98.4%, 98.1%, and 97.9% for ViT, MobileNetV2, and ResNet18, respectively. Despite the data being imbalanced, the precision, which indicates the proportion of correctly identified positive instances among all predicted positive instances, approached nearly 99% with the ViT. MobileNetV2 and ResNet18 demonstrated similar results. Overall, the analysis revealed that the vision transformer model exhibited superior performance in strawberry ripeness and disease classification. The inclusion of attention heads in the early layers of ResNet18 and MobileNet18, along with the inherent attention mechanism in ViT, improved the accuracy of image identification. These findings offer the potential for farmers to enhance strawberry cultivation through passive camera monitoring alone, promoting the health and well-being of the population.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用视觉变换器和基于注意力的卷积神经网络优化草莓病害和质量检测

事实证明，机器学习和计算机视觉是农民简化资源利用、提高农业生产可持续性和效率的重要工具。这些技术过去曾被应用于草莓种植，但成效有限。为了在过去工作的基础上更进一步，本研究收集了两组独立的草莓图像及其相关病害，并对其进行了大小调整和增强。随后，由九个类别组成的综合数据集被用来微调三个不同的预训练模型：视觉转换器（ViT）、MobileNetV2 和 ResNet18。为了解决数据集中类别分布不平衡的问题，我们为每个类别分配了权重，以确保在训练过程中产生几乎相同的影响。为了提高结果，通过去除背景、减少噪音和翻转生成了新的图像。选定 ViT、MobileNetV2 和 ResNet18 后，对它们的性能进行了比较。对所有三种算法都进行了任务定制，并对其性能进行了评估。在整个实验过程中，没有一个层被冻结，以确保所有层在训练过程中保持活跃。在 MobileNetV2 和 ResNet18 的前五层和后五层中加入了注意力头，同时修改了 ViT 的架构。结果显示，ViT、MobileNetV2 和 ResNet18 的准确率分别为 98.4%、98.1% 和 97.9%。尽管数据不平衡，但 ViT 的精确度（表示在所有预测的正向实例中正确识别出正向实例的比例）接近 99%。MobileNetV2 和 ResNet18 也取得了类似的结果。总之，分析结果表明，视觉转换器模型在草莓成熟度和疾病分类方面表现出卓越的性能。在 ResNet18 和 MobileNet18 的早期层中加入注意力头，再加上 ViT 固有的注意力机制，提高了图像识别的准确性。这些发现为农民提供了仅通过被动相机监测来加强草莓种植的潜力，促进了人们的健康和福祉。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Foods

自引率

0.00%

发文量