MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition

IF 2.9 3区 农林科学 Q2 FOOD SCIENCE & TECHNOLOGY Journal of Food Measurement and Characterization Pub Date : 2024-09-26 DOI:10.1007/s11694-024-02874-3
Yao Rao, Chaofeng Li, Feiran Xu, Ya Guo
{"title":"MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition","authors":"Yao Rao,&nbsp;Chaofeng Li,&nbsp;Feiran Xu,&nbsp;Ya Guo","doi":"10.1007/s11694-024-02874-3","DOIUrl":null,"url":null,"abstract":"<div><p>Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"18 11","pages":"9233 - 9251"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-024-02874-3","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MSAPVT:用于大规模水果识别的多尺度注意力金字塔视觉转换器网络
高效准确的水果识别对于自动水果采摘系统、质量评估和超市自助结账服务等应用至关重要。现有的基于视觉的方法,主要是利用卷积神经网络(CNN),通常能达到很高的性能,但受到计算复杂度高的阻碍,在边缘设备上进行实时部署具有挑战性。此外,水果品种之间的多样性和相似性,以及不平衡的水果数据集,都对通用深度学习算法构成了重大障碍。为了应对这些挑战,我们提出了多尺度注意力金字塔视觉转换器(MSAPVT)以及增强版的 Fru92 数据集。我们的 MSAPVT 引入了四项创新改进:注意力增强、维度调整、多尺度特征聚合和损失函数改进。首先,混合注意力模块(HAM)旨在更好地完善金字塔视觉转换器 v2(PVTv2)的多级特征。其次,设计了维度调整层(DAL),以增加高级特征的权重。第三,引入多尺度特征聚合策略,以融合多尺度互补特征。最后,增加了 KL-发散损失,以增强多尺度特征之间的差异。这些创新使 MSAPVT 能够捕捉到水果图像中的细微细节,生成高辨别度的表征,同时模型复杂度很低。我们的模型在 Fru92 和 Fru92s 数据集上取得了最佳结果,Top-1 Acc.最终,一个基于 MSAPVT 的平易近人且高效的水果分类系统被设计出来,并得到了潜在的应用。改进后的数据集可在 https://github.com/iamraoyao/MSAPVT-Inference-Demo 网站上查阅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Food Measurement and Characterization
Journal of Food Measurement and Characterization Agricultural and Biological Sciences-Food Science
CiteScore
6.00
自引率
11.80%
发文量
425
期刊介绍: This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance. The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.
期刊最新文献
Red radish anthocyanin-based intelligent starch/pectin films for shrimp sub-freshness monitoring: effect of adjusting pH value of the film-forming solution Comparative analysis of biochemical composition of fried coconut chips: influence of thickness and oil type on nutritional attributes Phytonutrient-rich guava date fruit confection: formulation, quality analysis and shelf-life studies A novel synbiotic edible film based on aquafaba, psyllium husk powder, PEG 400, and Lactiplantibacillus plantarum 299v and applicability on Kashar cheese Microencapsulation of Elaeis guineensis leaf extract powder enriched with bioflavonoid preserves its antioxidant and anti-inflammatory properties: spray drying optimization and powder characterization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1