Large Language Model With Region-Guided Referring and Grounding for CT Report Generation

Zhixuan Chen;Yequan Bie;Haibo Jin;Hao Chen
{"title":"Large Language Model With Region-Guided Referring and Grounding for CT Report Generation","authors":"Zhixuan Chen;Yequan Bie;Haibo Jin;Hao Chen","doi":"10.1109/TMI.2025.3559923","DOIUrl":null,"url":null,"abstract":"Computed tomography (CT) report generation is crucial to assist radiologists in interpreting CT volumes, which can be time-consuming and labor-intensive. Existing methods primarily only consider the global features of the entire volume, making it struggle to focus on specific regions and potentially missing abnormalities. To address this issue, we propose Reg2RG, the first region-guided referring and grounding framework for CT report generation, which enhances diagnostic performance by focusing on anatomical regions within the volume. Specifically, we utilize masks from a universal segmentation module to capture local features for each referring region. A local feature decoupling (LFD) strategy is proposed to preserve the local high-resolution details with little computational overhead. Then the local features are integrated with global features to capture inter-regional relationships within a cohesive context. Moreover, we propose a novel region-report alignment (RRA) training strategy. It leverages the recognition of referring regions to guide the generation of region-specific reports, enhancing the model’s referring and grounding capabilities while also improving the report’s interpretability. A large language model (LLM) is further employed as the language decoder to generate reports from integrated visual features, facilitating region-level comprehension. Extensive experiments on two large-scale chest CT-report datasets demonstrate the superiority of our method, which outperforms several state-of-the-art methods in terms of both natural language generation and clinical efficacy metrics while preserving promising interpretability. The code is available at <uri>https://github.com/zhi-xuan-chen/Reg2RG</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3139-3150"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10963672/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Computed tomography (CT) report generation is crucial to assist radiologists in interpreting CT volumes, which can be time-consuming and labor-intensive. Existing methods primarily only consider the global features of the entire volume, making it struggle to focus on specific regions and potentially missing abnormalities. To address this issue, we propose Reg2RG, the first region-guided referring and grounding framework for CT report generation, which enhances diagnostic performance by focusing on anatomical regions within the volume. Specifically, we utilize masks from a universal segmentation module to capture local features for each referring region. A local feature decoupling (LFD) strategy is proposed to preserve the local high-resolution details with little computational overhead. Then the local features are integrated with global features to capture inter-regional relationships within a cohesive context. Moreover, we propose a novel region-report alignment (RRA) training strategy. It leverages the recognition of referring regions to guide the generation of region-specific reports, enhancing the model’s referring and grounding capabilities while also improving the report’s interpretability. A large language model (LLM) is further employed as the language decoder to generate reports from integrated visual features, facilitating region-level comprehension. Extensive experiments on two large-scale chest CT-report datasets demonstrate the superiority of our method, which outperforms several state-of-the-art methods in terms of both natural language generation and clinical efficacy metrics while preserving promising interpretability. The code is available at https://github.com/zhi-xuan-chen/Reg2RG.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型与区域引导参考和基础的CT报告生成
计算机断层扫描(CT)报告生成对于帮助放射科医生解释CT体积至关重要,这可能是耗时和劳动密集型的。现有的方法主要只考虑整个体积的整体特征,使其难以关注特定区域和潜在的缺失异常。为了解决这个问题,我们提出了Reg2RG,这是第一个用于CT报告生成的区域导向参考和基础框架,它通过关注体积内的解剖区域来提高诊断性能。具体来说,我们利用通用分割模块的掩码来捕获每个参考区域的局部特征。提出了一种局部特征解耦(LFD)策略,在保持局部高分辨率细节的同时减少了计算开销。然后,将局部特征与全局特征相结合,以捕获内聚上下文中的区域间关系。此外,我们提出了一种新的区域报告对齐(RRA)培训策略。它利用对参考区域的识别来指导特定区域报告的生成,增强模型的参考和基础能力,同时也提高了报告的可解释性。采用大型语言模型(LLM)作为语言解码器,从集成的视觉特征生成报告,便于区域级理解。在两个大型胸部ct报告数据集上进行的大量实验证明了我们的方法的优越性,该方法在自然语言生成和临床疗效指标方面优于几种最先进的方法,同时保留了有希望的可解释性。代码可在https://github.com/zhi-xuan-chen/Reg2RG上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-Ray: Summary of the PENGWIN 2024 Challenge. Regression is all you need for medical image translation. NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction. Anatomy-aware Sketch-guided Latent Diffusion Model for Orbital Tumor Multi-Parametric MRI Missing Modalities Synthesis. Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1