用于放射图像分析的混合视觉变换器架构系统综述

Ji Woong Kim, Aisha Urooj Khan, Imon Banerjee
{"title":"用于放射图像分析的混合视觉变换器架构系统综述","authors":"Ji Woong Kim, Aisha Urooj Khan, Imon Banerjee","doi":"10.1101/2024.06.21.24309265","DOIUrl":null,"url":null,"abstract":"Background: Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) each possess distinct strengths in medical imaging: ViT excels in capturing long-range dependencies through self-attention, while CNNs are adept at extracting local features via spatial convolution filters. While ViTs might struggle with capturing detailed local spatial information critical for tasks like anomaly detection in medical imaging, shallow CNNs often fail to effectively abstract global context.\nObjective: This study aims to explore and evaluate hybrid architectures that integrate ViT and CNN to lever- age their complementary strengths for enhanced performance in medical vision tasks, such as segmentation, classification, and prediction.\nMethods: Following PRISMA guideline, a systematic review was conducted on 28 articles published between 2020 and 2023. These articles proposed hybrid ViT-CNN architectures specifically for medical imaging tasks in radiology. The review focused on analyzing architectural variations, merging strategies between ViT and CNN, innovative applications of ViT, and efficiency metrics including parameters, inference time (GFlops), and performance benchmarks.\nResults: The review identified that integrating ViT and CNN can help mitigate the limitations of each architecture, offering comprehensive solutions that combine global context understanding with precise local feature extraction. We benchmarked the articles based on architectural variations, merging strategies, innovative uses of ViT, and efficiency metrics (number of parameters, inference time (GFlops), performance).\nConclusion: By synthesizing current literature, this review defines fundamental concepts of hybrid vision transformers and highlights emerging trends in the field. It provides a clear direction for future research aimed at optimizing the integration of ViT and CNN for effective utilization in medical imaging, contributing to advancements in diagnostic accuracy and image analysis.","PeriodicalId":501358,"journal":{"name":"medRxiv - Radiology and Imaging","volume":"81 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis\",\"authors\":\"Ji Woong Kim, Aisha Urooj Khan, Imon Banerjee\",\"doi\":\"10.1101/2024.06.21.24309265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) each possess distinct strengths in medical imaging: ViT excels in capturing long-range dependencies through self-attention, while CNNs are adept at extracting local features via spatial convolution filters. While ViTs might struggle with capturing detailed local spatial information critical for tasks like anomaly detection in medical imaging, shallow CNNs often fail to effectively abstract global context.\\nObjective: This study aims to explore and evaluate hybrid architectures that integrate ViT and CNN to lever- age their complementary strengths for enhanced performance in medical vision tasks, such as segmentation, classification, and prediction.\\nMethods: Following PRISMA guideline, a systematic review was conducted on 28 articles published between 2020 and 2023. These articles proposed hybrid ViT-CNN architectures specifically for medical imaging tasks in radiology. The review focused on analyzing architectural variations, merging strategies between ViT and CNN, innovative applications of ViT, and efficiency metrics including parameters, inference time (GFlops), and performance benchmarks.\\nResults: The review identified that integrating ViT and CNN can help mitigate the limitations of each architecture, offering comprehensive solutions that combine global context understanding with precise local feature extraction. We benchmarked the articles based on architectural variations, merging strategies, innovative uses of ViT, and efficiency metrics (number of parameters, inference time (GFlops), performance).\\nConclusion: By synthesizing current literature, this review defines fundamental concepts of hybrid vision transformers and highlights emerging trends in the field. It provides a clear direction for future research aimed at optimizing the integration of ViT and CNN for effective utilization in medical imaging, contributing to advancements in diagnostic accuracy and image analysis.\",\"PeriodicalId\":501358,\"journal\":{\"name\":\"medRxiv - Radiology and Imaging\",\"volume\":\"81 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Radiology and Imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.06.21.24309265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Radiology and Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.21.24309265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:视觉转换器(ViT)和卷积神经网络(CNN)在医学成像方面各有所长:ViT 擅长通过自我关注捕捉长距离依赖关系,而 CNN 则擅长通过空间卷积滤波器提取局部特征。ViT 可能难以捕捉对医学成像中异常检测等任务至关重要的详细局部空间信息,而浅层 CNN 则往往无法有效抽象出全局上下文:本研究旨在探索和评估整合了 ViT 和 CNN 的混合架构,利用它们的互补优势来提高医疗视觉任务(如分割、分类和预测)的性能:按照 PRISMA 准则,对 2020 年至 2023 年间发表的 28 篇文章进行了系统性综述。这些文章专门针对放射学中的医学成像任务提出了混合 ViT-CNN 架构。综述重点分析了架构的变化、ViT 与 CNN 的合并策略、ViT 的创新应用以及效率指标,包括参数、推理时间(GFlops)和性能基准:综述发现,整合 ViT 和 CNN 有助于缓解每种架构的局限性,提供结合全局上下文理解和精确局部特征提取的全面解决方案。我们根据架构变化、合并策略、ViT 的创新应用以及效率指标(参数数量、推理时间(GFlops)、性能)对文章进行了基准测试:本综述综合了当前的文献,定义了混合视觉转换器的基本概念,并强调了该领域的新兴趋势。它为未来的研究提供了明确的方向,旨在优化 ViT 和 CNN 的集成,以便在医学成像中有效利用,从而促进诊断准确性和图像分析的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis
Background: Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) each possess distinct strengths in medical imaging: ViT excels in capturing long-range dependencies through self-attention, while CNNs are adept at extracting local features via spatial convolution filters. While ViTs might struggle with capturing detailed local spatial information critical for tasks like anomaly detection in medical imaging, shallow CNNs often fail to effectively abstract global context. Objective: This study aims to explore and evaluate hybrid architectures that integrate ViT and CNN to lever- age their complementary strengths for enhanced performance in medical vision tasks, such as segmentation, classification, and prediction. Methods: Following PRISMA guideline, a systematic review was conducted on 28 articles published between 2020 and 2023. These articles proposed hybrid ViT-CNN architectures specifically for medical imaging tasks in radiology. The review focused on analyzing architectural variations, merging strategies between ViT and CNN, innovative applications of ViT, and efficiency metrics including parameters, inference time (GFlops), and performance benchmarks. Results: The review identified that integrating ViT and CNN can help mitigate the limitations of each architecture, offering comprehensive solutions that combine global context understanding with precise local feature extraction. We benchmarked the articles based on architectural variations, merging strategies, innovative uses of ViT, and efficiency metrics (number of parameters, inference time (GFlops), performance). Conclusion: By synthesizing current literature, this review defines fundamental concepts of hybrid vision transformers and highlights emerging trends in the field. It provides a clear direction for future research aimed at optimizing the integration of ViT and CNN for effective utilization in medical imaging, contributing to advancements in diagnostic accuracy and image analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Auto-segmentation of hemi-diaphragms in free-breathing dynamic MRI of pediatric subjects with thoracic insufficiency syndrome Dynamic MR of muscle contraction during electrical muscle stimulation as a potential diagnostic tool for neuromuscular disease Deriving Imaging Biomarkers for Primary Central Nervous System Lymphoma Using Deep Learning Exploring subthreshold functional network alterations in women with phenylketonuria by higher criticism Beyond Algorithms: The Impact of Simplified CNN Models and Multifactorial Influences on Radiological Image Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1