PhraseAug: An Augmented Medical Report Generation Model with Phrasebook.

IEEE transactions on medical imaging Pub Date : 2024-06-18 DOI:10.1109/TMI.2024.3416190

Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu

{"title":"PhraseAug: An Augmented Medical Report Generation Model with Phrasebook.","authors":"Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu","doi":"10.1109/TMI.2024.3416190","DOIUrl":null,"url":null,"abstract":"<p><p>Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TMI.2024.3416190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PhraseAug：带短语集的增强型医疗报告生成模型。

医学报告生成是一项极具价值和挑战性的任务，它能自动生成准确流畅的医学影像诊断报告，减轻放射科医生的工作量，提高疾病诊断的效率。医疗图像和报告的精细对齐有助于探索图像和文本之间的密切关联，这对跨模态生成至关重要。然而，由于放射科医生的写作风格造成的视觉和语言偏差，使得跨模态图像-文本配准变得十分困难。为了减轻视觉语言偏差，本文将医疗报告离散化，并引入了一种中间模态，即由关键名词短语组成的短语集。作为医疗报告的离散化表示，短语集既包含与疾病相关的医学术语，也包含代表不同写作风格的同义短语，可以识别同义句子，从而促进图像和报告之间的精细配准。本文开发了一种带短语集的两阶段医疗报告生成增强模型（PhraseAug），该模型结合了医学图像、临床病历和写作风格来生成诊断报告。在第一阶段，短语集用于提取与语义相关的重要特征，并预测报告中包含的关键短语。第二阶段，根据预测的关键短语生成包含同义短语的医疗报告，从而促进我们的模型适应不同的写作风格，生成多样化的医疗报告。在 IU-Xray 和 MIMIC-CXR 两个公共数据集上的实验结果表明，我们提出的 PhraseAug 优于最先进的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量