MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition

IF 2.9 3区农林科学 Q2 FOOD SCIENCE & TECHNOLOGY Journal of Food Measurement and Characterization Pub Date : 2024-09-26 DOI:10.1007/s11694-024-02874-3

Yao Rao, Chaofeng Li, Feiran Xu, Ya Guo

{"title":"MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition","authors":"Yao Rao, Chaofeng Li, Feiran Xu, Ya Guo","doi":"10.1007/s11694-024-02874-3","DOIUrl":null,"url":null,"abstract":"<div><p>Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"18 11","pages":"9233 - 9251"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-024-02874-3","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MSAPVT：用于大规模水果识别的多尺度注意力金字塔视觉转换器网络

高效准确的水果识别对于自动水果采摘系统、质量评估和超市自助结账服务等应用至关重要。现有的基于视觉的方法，主要是利用卷积神经网络（CNN），通常能达到很高的性能，但受到计算复杂度高的阻碍，在边缘设备上进行实时部署具有挑战性。此外，水果品种之间的多样性和相似性，以及不平衡的水果数据集，都对通用深度学习算法构成了重大障碍。为了应对这些挑战，我们提出了多尺度注意力金字塔视觉转换器（MSAPVT）以及增强版的 Fru92 数据集。我们的 MSAPVT 引入了四项创新改进：注意力增强、维度调整、多尺度特征聚合和损失函数改进。首先，混合注意力模块（HAM）旨在更好地完善金字塔视觉转换器 v2（PVTv2）的多级特征。其次，设计了维度调整层（DAL），以增加高级特征的权重。第三，引入多尺度特征聚合策略，以融合多尺度互补特征。最后，增加了 KL-发散损失，以增强多尺度特征之间的差异。这些创新使 MSAPVT 能够捕捉到水果图像中的细微细节，生成高辨别度的表征，同时模型复杂度很低。我们的模型在 Fru92 和 Fru92s 数据集上取得了最佳结果，Top-1 Acc.最终，一个基于 MSAPVT 的平易近人且高效的水果分类系统被设计出来，并得到了潜在的应用。改进后的数据集可在 https://github.com/iamraoyao/MSAPVT-Inference-Demo 网站上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Food Measurement and Characterization Agricultural and Biological Sciences-Food Science

CiteScore

6.00

自引率

11.80%

发文量

425

期刊介绍： This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance. The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.