BAE-ViT: An Efficient Multimodal Vision Transformer for Bone Age Estimation.

IF 2.2 4区 医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Tomography Pub Date : 2024-12-13 DOI:10.3390/tomography10120146
Jinnian Zhang, Weijie Chen, Tanmayee Joshi, Xiaomin Zhang, Po-Ling Loh, Varun Jog, Richard J Bruce, John W Garrett, Alan B McMillan
{"title":"BAE-ViT: An Efficient Multimodal Vision Transformer for Bone Age Estimation.","authors":"Jinnian Zhang, Weijie Chen, Tanmayee Joshi, Xiaomin Zhang, Po-Ling Loh, Varun Jog, Richard J Bruce, John W Garrett, Alan B McMillan","doi":"10.3390/tomography10120146","DOIUrl":null,"url":null,"abstract":"<p><p>This research introduces BAE-ViT, a specialized vision transformer model developed for bone age estimation (BAE). This model is designed to efficiently merge image and sex data, a capability not present in traditional convolutional neural networks (CNNs). BAE-ViT employs a novel data fusion method to facilitate detailed interactions between visual and non-visual data by tokenizing non-visual information and concatenating all tokens (visual or non-visual) as the input to the model. The model underwent training on a large-scale dataset from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, where it exhibited commendable performance, particularly excelling in handling image distortions compared to existing models. The effectiveness of BAE-ViT was further affirmed through statistical analysis, demonstrating a strong correlation with the actual ground-truth labels. This study contributes to the field by showcasing the potential of vision transformers as a viable option for integrating multimodal data in medical imaging applications, specifically emphasizing their capacity to incorporate non-visual elements like sex information into the framework. This tokenization method not only demonstrates superior performance in this specific task but also offers a versatile framework for integrating multimodal data in medical imaging applications.</p>","PeriodicalId":51330,"journal":{"name":"Tomography","volume":"10 12","pages":"2058-2072"},"PeriodicalIF":2.2000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11679900/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tomography","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/tomography10120146","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

This research introduces BAE-ViT, a specialized vision transformer model developed for bone age estimation (BAE). This model is designed to efficiently merge image and sex data, a capability not present in traditional convolutional neural networks (CNNs). BAE-ViT employs a novel data fusion method to facilitate detailed interactions between visual and non-visual data by tokenizing non-visual information and concatenating all tokens (visual or non-visual) as the input to the model. The model underwent training on a large-scale dataset from the 2017 RSNA Pediatric Bone Age Machine Learning Challenge, where it exhibited commendable performance, particularly excelling in handling image distortions compared to existing models. The effectiveness of BAE-ViT was further affirmed through statistical analysis, demonstrating a strong correlation with the actual ground-truth labels. This study contributes to the field by showcasing the potential of vision transformers as a viable option for integrating multimodal data in medical imaging applications, specifically emphasizing their capacity to incorporate non-visual elements like sex information into the framework. This tokenization method not only demonstrates superior performance in this specific task but also offers a versatile framework for integrating multimodal data in medical imaging applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BAE-ViT:一种有效的骨龄估计的多模态视觉转换器。
本研究介绍了一种专门用于骨龄估计(BAE)的视觉变形模型BAE- vit。该模型旨在有效地合并图像和性别数据,这是传统卷积神经网络(cnn)所不具备的能力。BAE-ViT采用一种新颖的数据融合方法,通过对非视觉信息进行标记,并将所有标记(视觉或非视觉)连接起来作为模型的输入,从而促进视觉和非视觉数据之间的详细交互。该模型在2017年RSNA儿童骨龄机器学习挑战赛的大规模数据集上进行了训练,与现有模型相比,它表现出了值得称赞的性能,特别是在处理图像失真方面表现出色。通过统计分析进一步肯定了ae - vit的有效性,显示出与实际的地基真值标签有很强的相关性。这项研究通过展示视觉转换器作为医学成像应用中集成多模态数据的可行选择的潜力,特别强调了它们将非视觉元素(如性别信息)纳入框架的能力,为该领域做出了贡献。这种标记化方法不仅在这一特定任务中表现出优越的性能,而且为医学成像应用中集成多模态数据提供了一个通用的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Tomography
Tomography Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
2.70
自引率
10.50%
发文量
222
期刊介绍: TomographyTM publishes basic (technical and pre-clinical) and clinical scientific articles which involve the advancement of imaging technologies. Tomography encompasses studies that use single or multiple imaging modalities including for example CT, US, PET, SPECT, MR and hyperpolarization technologies, as well as optical modalities (i.e. bioluminescence, photoacoustic, endomicroscopy, fiber optic imaging and optical computed tomography) in basic sciences, engineering, preclinical and clinical medicine. Tomography also welcomes studies involving exploration and refinement of contrast mechanisms and image-derived metrics within and across modalities toward the development of novel imaging probes for image-based feedback and intervention. The use of imaging in biology and medicine provides unparalleled opportunities to noninvasively interrogate tissues to obtain real-time dynamic and quantitative information required for diagnosis and response to interventions and to follow evolving pathological conditions. As multi-modal studies and the complexities of imaging technologies themselves are ever increasing to provide advanced information to scientists and clinicians. Tomography provides a unique publication venue allowing investigators the opportunity to more precisely communicate integrated findings related to the diverse and heterogeneous features associated with underlying anatomical, physiological, functional, metabolic and molecular genetic activities of normal and diseased tissue. Thus Tomography publishes peer-reviewed articles which involve the broad use of imaging of any tissue and disease type including both preclinical and clinical investigations. In addition, hardware/software along with chemical and molecular probe advances are welcome as they are deemed to significantly contribute towards the long-term goal of improving the overall impact of imaging on scientific and clinical discovery.
期刊最新文献
Enhanced Detection of Residual Breast Cancer Post-Excisional Biopsy: Comparative Analysis of Contrast-Enhanced MRI with and Without Diffusion-Weighted Imaging. Comparative Sensitivity of MRI Indices for Myelin Assessment in Spinal Cord Regions. CT Angiography Assessment of Dorsal Pancreatic Artery and Intrapancreatic Arcade Anatomy: Impact on Whipple Surgery Outcomes. Fast Hadamard-Encoded 7T Spectroscopic Imaging of Human Brain. Unraveling the Invisible: Topological Data Analysis as the New Frontier in Radiology's Diagnostic Arsenal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1