Cross-Modal Augmented Transformer for Automated Medical Report Generation

IF 3.7 3区 医学 Q2 ENGINEERING, BIOMEDICAL IEEE Journal of Translational Engineering in Health and Medicine-Jtehm Pub Date : 2025-01-29 DOI:10.1109/JTEHM.2025.3536441
Yuhao Tang;Ye Yuan;Fei Tao;Minghao Tang
{"title":"Cross-Modal Augmented Transformer for Automated Medical Report Generation","authors":"Yuhao Tang;Ye Yuan;Fei Tao;Minghao Tang","doi":"10.1109/JTEHM.2025.3536441","DOIUrl":null,"url":null,"abstract":"In clinical practice, interpreting medical images and composing diagnostic reports typically involve significant manual workload. Therefore, an automated report generation framework that mimics a doctor’s diagnosis better meets the requirements of medical scenarios. Prior investigations often overlook this critical aspect, primarily relying on traditional image captioning frameworks initially designed for general-domain images and sentences. Despite achieving some advancements, these methodologies encounter two primary challenges. First, the strong noise in blurred medical images always hinders the model of capturing the lesion region. Second, during report writing, doctors typically rely on terminology for diagnosis, a crucial aspect that has been neglected in prior frameworks. In this paper, we present a novel approach called Cross-modal Augmented Transformer (CAT) for medical report generation. Unlike previous methods that rely on coarse-grained features without human intervention, our method introduces a “locate then generate” pattern, thereby improving the interpretability of the generated reports. During the locate stage, CAT captures crucial representations by pre-aligning significant patches and their corresponding medical terminologies. This pre-alignment helps reduce visual noise by discarding low-ranking content, ensuring that only relevant information is considered in the report generation process. During the generation phase, CAT utilizes a multi-modality encoder to reinforce the correlation between generated keywords, retrieved terminologies and regions. Furthermore, CAT employs a dual-stream decoder that dynamically determines whether the predicted word should be influenced by the retrieved terminology or the preceding sentence. Experimental results demonstrate the effectiveness of the proposed method on two datasets.Clinical impact: This work aims to design an automated framework for explaining medical images to evaluate the health status of individuals, thereby facilitating their broader application in clinical settings.Clinical and Translational Impact Statement: In our preclinical research, we develop an automated system for generating diagnostic reports. This system mimics manual diagnostic methods by combining fine-grained semantic alignment with dual-stream decoders.","PeriodicalId":54255,"journal":{"name":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","volume":"13 ","pages":"33-48"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10857391","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10857391/","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

In clinical practice, interpreting medical images and composing diagnostic reports typically involve significant manual workload. Therefore, an automated report generation framework that mimics a doctor’s diagnosis better meets the requirements of medical scenarios. Prior investigations often overlook this critical aspect, primarily relying on traditional image captioning frameworks initially designed for general-domain images and sentences. Despite achieving some advancements, these methodologies encounter two primary challenges. First, the strong noise in blurred medical images always hinders the model of capturing the lesion region. Second, during report writing, doctors typically rely on terminology for diagnosis, a crucial aspect that has been neglected in prior frameworks. In this paper, we present a novel approach called Cross-modal Augmented Transformer (CAT) for medical report generation. Unlike previous methods that rely on coarse-grained features without human intervention, our method introduces a “locate then generate” pattern, thereby improving the interpretability of the generated reports. During the locate stage, CAT captures crucial representations by pre-aligning significant patches and their corresponding medical terminologies. This pre-alignment helps reduce visual noise by discarding low-ranking content, ensuring that only relevant information is considered in the report generation process. During the generation phase, CAT utilizes a multi-modality encoder to reinforce the correlation between generated keywords, retrieved terminologies and regions. Furthermore, CAT employs a dual-stream decoder that dynamically determines whether the predicted word should be influenced by the retrieved terminology or the preceding sentence. Experimental results demonstrate the effectiveness of the proposed method on two datasets.Clinical impact: This work aims to design an automated framework for explaining medical images to evaluate the health status of individuals, thereby facilitating their broader application in clinical settings.Clinical and Translational Impact Statement: In our preclinical research, we develop an automated system for generating diagnostic reports. This system mimics manual diagnostic methods by combining fine-grained semantic alignment with dual-stream decoders.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.40
自引率
2.90%
发文量
65
审稿时长
27 weeks
期刊介绍: The IEEE Journal of Translational Engineering in Health and Medicine is an open access product that bridges the engineering and clinical worlds, focusing on detailed descriptions of advanced technical solutions to a clinical need along with clinical results and healthcare relevance. The journal provides a platform for state-of-the-art technology directions in the interdisciplinary field of biomedical engineering, embracing engineering, life sciences and medicine. A unique aspect of the journal is its ability to foster a collaboration between physicians and engineers for presenting broad and compelling real world technological and engineering solutions that can be implemented in the interest of improving quality of patient care and treatment outcomes, thereby reducing costs and improving efficiency. The journal provides an active forum for clinical research and relevant state-of the-art technology for members of all the IEEE societies that have an interest in biomedical engineering as well as reaching out directly to physicians and the medical community through the American Medical Association (AMA) and other clinical societies. The scope of the journal includes, but is not limited, to topics on: Medical devices, healthcare delivery systems, global healthcare initiatives, and ICT based services; Technological relevance to healthcare cost reduction; Technology affecting healthcare management, decision-making, and policy; Advanced technical work that is applied to solving specific clinical needs.
期刊最新文献
Temporal Relation Modeling and Multimodal Adversarial Alignment Network for Pilot Workload Evaluation Quantification of Motor Learning in Hand Adjustability Movements: An Evaluation Variable for Discriminant Cognitive Decline Cross-Modal Augmented Transformer for Automated Medical Report Generation Multi-Branch CNN-LSTM Fusion Network-Driven System With BERT Semantic Evaluator for Radiology Reporting in Emergency Head CTs Intelligent Neonatal Blood Perfusion Assessment System Based on Near-Infrared Spectroscopy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1