PhraseAug: An Augmented Medical Report Generation Model with Phrasebook.

Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu
{"title":"PhraseAug: An Augmented Medical Report Generation Model with Phrasebook.","authors":"Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu","doi":"10.1109/TMI.2024.3416190","DOIUrl":null,"url":null,"abstract":"<p><p>Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3416190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PhraseAug:带短语集的增强型医疗报告生成模型。
医学报告生成是一项极具价值和挑战性的任务,它能自动生成准确流畅的医学影像诊断报告,减轻放射科医生的工作量,提高疾病诊断的效率。医疗图像和报告的精细对齐有助于探索图像和文本之间的密切关联,这对跨模态生成至关重要。然而,由于放射科医生的写作风格造成的视觉和语言偏差,使得跨模态图像-文本配准变得十分困难。为了减轻视觉语言偏差,本文将医疗报告离散化,并引入了一种中间模态,即由关键名词短语组成的短语集。作为医疗报告的离散化表示,短语集既包含与疾病相关的医学术语,也包含代表不同写作风格的同义短语,可以识别同义句子,从而促进图像和报告之间的精细配准。本文开发了一种带短语集的两阶段医疗报告生成增强模型(PhraseAug),该模型结合了医学图像、临床病历和写作风格来生成诊断报告。在第一阶段,短语集用于提取与语义相关的重要特征,并预测报告中包含的关键短语。第二阶段,根据预测的关键短语生成包含同义短语的医疗报告,从而促进我们的模型适应不同的写作风格,生成多样化的医疗报告。在 IU-Xray 和 MIMIC-CXR 两个公共数据集上的实验结果表明,我们提出的 PhraseAug 优于最先进的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis. Self-navigated 3D diffusion MRI using an optimized CAIPI sampling and structured low-rank reconstruction estimated navigator. Low-dose CT image super-resolution with noise suppression based on prior degradation estimator and self-guidance mechanism. Table of Contents LOQUAT: Low-Rank Quaternion Reconstruction for Photon-Counting CT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1