首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
Dual-Source CBCT for Large FoV Imaging Under Short-Scan Trajectories 短扫描轨迹下大视场成像的双源CBCT
Pub Date : 2025-07-07 DOI: 10.1109/TMI.2025.3586622
Tianling Lyu;Xusheng Zhang;Xinyun Zhong;Zhan Wu;Yan Xi;Wei Zhao;Yang Chen;Yuanjing Feng;Wentao Zhu
Cone-beam CT is extensively used in medical diagnosis and treatment. Despite its large longitudinal field of view (FoV), the horizontal FoV of CBCT systems is severely limited due to the detector width. Certain commercial CBCT systems increase the horizontal FoV by employing the offset detector method. However, this method necessitates 360° full circular scanning trajectory which increases the scanning time and is not compatible with specific CBCT system models. In this paper, we investigate the feasibility of large FoV imaging under short scan trajectories with an additional X-ray source. A dual-source CBCT geometry is proposed as well as two corresponding image reconstruction algorithms. The first one is based on cone-parallel rebinning and the subsequent employs a modified Parker weighting scheme. Theoretical calculations demonstrate that the proposed geometry achieves a wider horizontal FoV than the ${90}%$ detector offset geometry (radius of ${214}.{83}textit {mm}$ vs. ${198}.{99}textit {mm}$ ) with a significantly reduced rotation angle (less than 230° vs. 360°). As demonstrated by experiments, the proposed geometry and reconstruction algorithms obtain comparable imaging qualities within the FoV to conventional CBCT imaging techniques. Implementing the proposed geometry is straightforward and does not substantially increase development expenses. It possesses the capacity to expand CBCT applications even further.
锥束CT在医学诊断和治疗中有着广泛的应用。尽管CBCT系统具有较大的纵向视场(FoV),但由于探测器宽度的限制,CBCT系统的水平视场受到严重限制。某些商用CBCT系统通过采用偏移检测法来增加水平视场。然而,该方法需要360°全圆周扫描轨迹,增加了扫描时间,并且与特定的CBCT系统模型不兼容。在本文中,我们研究了在短扫描轨迹下使用附加x射线源进行大视场成像的可行性。提出了一种双源CBCT几何结构以及两种相应的图像重建算法。第一种方法是基于锥平行重球,第二种方法采用改进的帕克加权方法。理论计算表明,所提出的几何结构比${90}%$探测器偏移几何(半径${214})实现了更宽的水平视场。{83}textit {mm}$ vs. ${198}。{99}textit {mm}$),旋转角度明显减小(小于230°vs 360°)。实验证明,所提出的几何和重建算法在视场内获得了与传统CBCT成像技术相当的成像质量。实现所建议的几何结构非常简单,并且不会大幅增加开发费用。它具有进一步扩大CBCT应用的能力。
{"title":"Dual-Source CBCT for Large FoV Imaging Under Short-Scan Trajectories","authors":"Tianling Lyu;Xusheng Zhang;Xinyun Zhong;Zhan Wu;Yan Xi;Wei Zhao;Yang Chen;Yuanjing Feng;Wentao Zhu","doi":"10.1109/TMI.2025.3586622","DOIUrl":"10.1109/TMI.2025.3586622","url":null,"abstract":"Cone-beam CT is extensively used in medical diagnosis and treatment. Despite its large longitudinal field of view (FoV), the horizontal FoV of CBCT systems is severely limited due to the detector width. Certain commercial CBCT systems increase the horizontal FoV by employing the offset detector method. However, this method necessitates 360° full circular scanning trajectory which increases the scanning time and is not compatible with specific CBCT system models. In this paper, we investigate the feasibility of large FoV imaging under short scan trajectories with an additional X-ray source. A dual-source CBCT geometry is proposed as well as two corresponding image reconstruction algorithms. The first one is based on cone-parallel rebinning and the subsequent employs a modified Parker weighting scheme. Theoretical calculations demonstrate that the proposed geometry achieves a wider horizontal FoV than the <inline-formula> <tex-math>${90}%$ </tex-math></inline-formula> detector offset geometry (radius of <inline-formula> <tex-math>${214}.{83}textit {mm}$ </tex-math></inline-formula> vs. <inline-formula> <tex-math>${198}.{99}textit {mm}$ </tex-math></inline-formula>) with a significantly reduced rotation angle (less than 230° vs. 360°). As demonstrated by experiments, the proposed geometry and reconstruction algorithms obtain comparable imaging qualities within the FoV to conventional CBCT imaging techniques. Implementing the proposed geometry is straightforward and does not substantially increase development expenses. It possesses the capacity to expand CBCT applications even further.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5051-5064"},"PeriodicalIF":0.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompting Vision-Language Model for Nuclei Instance Segmentation and Classification 核实例分割与分类的提示视觉语言模型。
Pub Date : 2025-06-25 DOI: 10.1109/TMI.2025.3579214
Jieru Yao;Guangyu Guo;Zhaohui Zheng;Qiang Xie;Longfei Han;Dingwen Zhang;Junwei Han
Nuclei instance segmentation and classification are a fundamental and challenging task in whole slide Imaging (WSI) analysis. Most dense nuclei prediction studies rely heavily on crowd labelled data on high-resolution digital images, leading to a time-consuming and expertise-required paradigm. Recently, Vision-Language Models (VLMs) have been intensively investigated, which learn rich cross-modal correlation from large-scale image-text pairs without tedious annotations. Inspired by this, we build a novel framework, called PromptNu, aiming at infusing abundant nuclei knowledge into the training of the nuclei instance recognition model through vision-language contrastive learning and prompt engineering techniques. Specifically, our approach starts with the creation of multifaceted prompts that integrate comprehensive nuclear knowledge, including visual insights from the GPT-4V model, statistical analyses, and expert insights from the pathology field. Then, we propose a novel prompting methodology that consists of two pivotal vision-language contrastive learning components: the Prompting Nuclei Representation Learning (PNuRL) and the Prompting Nuclei Dense Prediction (PNuDP), which adeptly integrates the expertise embedded in pre-trained VLMs and multifaceted prompts into the feature extraction and prediction process, respectively. Comprehensive experiments on six datasets with extensive WSI scenarios demonstrate the effectiveness of our method for both nuclei instance segmentation and classification tasks. The code is available at https://github.com/NucleiDet/PromptNu
核实例分割与分类是全切片成像(WSI)分析的基础和难点。大多数密集核预测研究严重依赖于高分辨率数字图像上的人群标记数据,这导致了一个耗时且需要专业知识的范例。近年来,视觉语言模型(VLMs)得到了广泛的研究,该模型能够从大规模的图像-文本对中学习丰富的跨模态相关性,而无需繁琐的注释。受此启发,我们构建了一个新的框架PromptNu,旨在通过视觉语言对比学习和提示工程技术,将丰富的核知识注入到核实例识别模型的训练中。具体来说,我们的方法从创建多方面的提示开始,这些提示集成了全面的核知识,包括来自GPT-4V模型的视觉见解,统计分析和来自病理学领域的专家见解。然后,我们提出了一种新的提示方法,它由两个关键的视觉语言对比学习组件组成:提示核表示学习(PNuRL)和提示核密集预测(PNuDP),它熟练地将嵌入在预训练vlm和多方面提示中的专业知识分别集成到特征提取和预测过程中。在六个具有广泛WSI场景的数据集上进行的综合实验表明,我们的方法对于核实例分割和分类任务都是有效的。代码可在https://github.com/NucleiDet/PromptNu上获得。
{"title":"Prompting Vision-Language Model for Nuclei Instance Segmentation and Classification","authors":"Jieru Yao;Guangyu Guo;Zhaohui Zheng;Qiang Xie;Longfei Han;Dingwen Zhang;Junwei Han","doi":"10.1109/TMI.2025.3579214","DOIUrl":"10.1109/TMI.2025.3579214","url":null,"abstract":"Nuclei instance segmentation and classification are a fundamental and challenging task in whole slide Imaging (WSI) analysis. Most dense nuclei prediction studies rely heavily on crowd labelled data on high-resolution digital images, leading to a time-consuming and expertise-required paradigm. Recently, Vision-Language Models (VLMs) have been intensively investigated, which learn rich cross-modal correlation from large-scale image-text pairs without tedious annotations. Inspired by this, we build a novel framework, called PromptNu, aiming at infusing abundant nuclei knowledge into the training of the nuclei instance recognition model through vision-language contrastive learning and prompt engineering techniques. Specifically, our approach starts with the creation of multifaceted prompts that integrate comprehensive nuclear knowledge, including visual insights from the GPT-4V model, statistical analyses, and expert insights from the pathology field. Then, we propose a novel prompting methodology that consists of two pivotal vision-language contrastive learning components: the Prompting Nuclei Representation Learning (PNuRL) and the Prompting Nuclei Dense Prediction (PNuDP), which adeptly integrates the expertise embedded in pre-trained VLMs and multifaceted prompts into the feature extraction and prediction process, respectively. Comprehensive experiments on six datasets with extensive WSI scenarios demonstrate the effectiveness of our method for both nuclei instance segmentation and classification tasks. The code is available at <uri>https://github.com/NucleiDet/PromptNu</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4567-4578"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Radiology Report Generation via Multi-Phased Supervision 通过多阶段监督加强放射学报告生成。
Pub Date : 2025-06-25 DOI: 10.1109/TMI.2025.3580659
Zailong Chen;Yingshu Li;Zhanyu Wang;Peng Gao;Johan Barthelemy;Luping Zhou;Lei Wang
Radiology report generation using large language models has recently produced reports with more realistic styles and better language fluency. However, their clinical accuracy remains inadequate. Considering the significant imbalance between clinical phrases and general descriptions in a report, we argue that using an entire report for supervision is problematic as it fails to emphasize the crucial clinical phrases, which require focused learning. To address this issue, we propose a multi-phased supervision method, inspired by the spirit of curriculum learning where models are trained by gradually increasing task complexity. Our approach organizes the learning process into structured phases at different levels of semantical granularity, each building on the previous one to enhance the model. During the first phase, disease labels are used to supervise the model, equipping it with the ability to identify underlying diseases. The second phase progresses to use entity-relation triples to guide the model to describe associated clinical findings. Finally, in the third phase, we introduce conventional whole-report-based supervision to quickly adapt the model for report generation. Throughout the phased training, the model remains the same and consistently operates in the generation mode. As experimentally demonstrated, this proposed change in the way of supervision enhances report generation, achieving state-of-the-art performance in both language fluency and clinical accuracy. Our work underscores the importance of training process design in radiology report generation. Our code is available on https://github.com/zailongchen/MultiP-R2Gen
使用大型语言模型的放射学报告生成最近产生了更逼真的风格和更好的语言流畅性的报告。然而,其临床准确性仍然不足。考虑到报告中临床短语和一般描述之间的显著不平衡,我们认为使用整个报告进行监督是有问题的,因为它没有强调需要集中学习的关键临床短语。为了解决这个问题,我们提出了一种多阶段监督方法,受课程学习精神的启发,通过逐渐增加任务复杂性来训练模型。我们的方法将学习过程组织成不同语义粒度级别的结构化阶段,每个阶段都建立在前一个阶段的基础上,以增强模型。在第一阶段,使用疾病标签来监督模型,使其具有识别潜在疾病的能力。第二阶段使用实体-关系三元组来指导模型描述相关的临床发现。最后,在第三阶段,我们引入传统的基于全报告的监管,以快速适应报告生成模式。在整个分阶段训练过程中,模型保持不变,始终以生成模式运行。正如实验证明的那样,这种提出的监督方式的改变增强了报告的生成,在语言流畅性和临床准确性方面都达到了最先进的表现。我们的工作强调了培训流程设计在放射学报告生成中的重要性。我们的代码可以在https://github.com/zailongchen/MultiP-R2Gen上找到。
{"title":"Enhancing Radiology Report Generation via Multi-Phased Supervision","authors":"Zailong Chen;Yingshu Li;Zhanyu Wang;Peng Gao;Johan Barthelemy;Luping Zhou;Lei Wang","doi":"10.1109/TMI.2025.3580659","DOIUrl":"10.1109/TMI.2025.3580659","url":null,"abstract":"Radiology report generation using large language models has recently produced reports with more realistic styles and better language fluency. However, their clinical accuracy remains inadequate. Considering the significant imbalance between clinical phrases and general descriptions in a report, we argue that using an entire report for supervision is problematic as it fails to emphasize the crucial clinical phrases, which require focused learning. To address this issue, we propose a multi-phased supervision method, inspired by the spirit of curriculum learning where models are trained by gradually increasing task complexity. Our approach organizes the learning process into structured phases at different levels of semantical granularity, each building on the previous one to enhance the model. During the first phase, disease labels are used to supervise the model, equipping it with the ability to identify underlying diseases. The second phase progresses to use entity-relation triples to guide the model to describe associated clinical findings. Finally, in the third phase, we introduce conventional whole-report-based supervision to quickly adapt the model for report generation. Throughout the phased training, the model remains the same and consistently operates in the generation mode. As experimentally demonstrated, this proposed change in the way of supervision enhances report generation, achieving state-of-the-art performance in both language fluency and clinical accuracy. Our work underscores the importance of training process design in radiology report generation. Our code is available on <uri>https://github.com/zailongchen/MultiP-R2Gen</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4666-4677"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMT++: Adaptively Collaborating LLMs With Multi-Specialized Teachers for Continual VQA in Robotic Surgical Videos lmt++:自适应协作法学硕士与多专业教师在机器人手术视频持续VQA
Pub Date : 2025-06-20 DOI: 10.1109/TMI.2025.3581108
Yuyang Du;Kexin Chen;Yue Zhan;Chang Han Low;Mobarakol Islam;Ziyu Guo;Yueming Jin;Guangyong Chen;Pheng Ann Heng
Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.
视觉问答(VQA)在推进外科教育中起着至关重要的作用。然而,由于患者数据的隐私问题,使用以前使用的数据训练VQA模型受到限制,因此有必要使用无范例持续学习(CL)方法。以往在外科领域的CL研究忽略了两个关键问题:i)由于从各种来源收集的手术程序范围广泛而导致的显著的领域转移,ii)由于医疗器械或手术程序的不平等发生而导致的数据不平衡问题。本文采用多模态大语言模型(LLM)和自适应权重分配策略来解决这些问题。首先,我们开发了一个新的LLM辅助的多教师CL框架(命名为lmt++),它可以利用多模态LLM作为补充教师的优势。LLM强大的泛化能力,以及对外科领域的良好理解,有助于解决领域转移和数据不平衡带来的知识差距。为了将LLM纳入我们的CL框架,我们进一步提出了一种处理训练数据的创新方法,该方法涉及将复杂的LLM嵌入转换为我们的CL训练框架中使用的logits值。此外,我们设计了一种自适应权重分配方法,平衡了LLM的泛化能力和在CL框架内以前的模型训练过程中获得的传统VQA模型的领域专业知识。最后,我们创建了一个新的外科VQA数据集用于模型评估。在这些数据集上的综合实验结果表明,我们的方法超越了最先进的CL方法。
{"title":"LMT++: Adaptively Collaborating LLMs With Multi-Specialized Teachers for Continual VQA in Robotic Surgical Videos","authors":"Yuyang Du;Kexin Chen;Yue Zhan;Chang Han Low;Mobarakol Islam;Ziyu Guo;Yueming Jin;Guangyong Chen;Pheng Ann Heng","doi":"10.1109/TMI.2025.3581108","DOIUrl":"10.1109/TMI.2025.3581108","url":null,"abstract":"Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4678-4689"},"PeriodicalIF":0.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144335331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPHARM-Reg: Unsupervised Cortical Surface Registration Using Spherical Harmonics 使用球面谐波的无监督皮质表面配准
Pub Date : 2025-06-20 DOI: 10.1109/TMI.2025.3581605
Seungeun Lee;Seunghwan Lee;Sunghwa Ryu;Ilwoo Lyu
We present a novel learning-based spherical registration method, called SPHARM-Reg, tailored for establishing cortical shape correspondence. SPHARM-Reg aims to reduce warp distortion that can introduce biases in downstream shape analyses. To achieve this, we tackle two critical challenges: (1) joint rigid and non-rigid alignments and (2) rotation-preserving smoothing. Conventional approaches perform rigid alignment only once before a non-rigid alignment. The resulting rotation is potentially sub-optimal, and the subsequent non-rigid alignment may introduce unnecessary distortion. In addition, common velocity encoding schemes on the unit sphere often fail to preserve the rotation component after spatial smoothing of velocity. To address these issues, we propose a diffeomorphic framework that integrates spherical harmonic decomposition of the velocity field with a novel velocity encoding scheme. SPHARM-Reg optimizes harmonic components of the velocity field, enabling joint adjustments for both rigid and non-rigid alignments. Furthermore, the proposed encoding scheme using spherical functions encourages consistent smoothing that preserves the rotation component. In the experiments, we validate SPHARM-Reg on healthy adult datasets. SPHARM-Reg achieves a substantial reduction in warp distortion while maintaining a high level of registration accuracy compared to existing methods. In the clinical analysis, we show that the extent of warp distortion significantly impacts statistical significance.
我们提出了一种新的基于学习的球面配准方法,称为spham - reg,专门用于建立皮质形状对应。spham - reg旨在减少翘曲失真,这可能会在下游形状分析中引入偏差。为了实现这一目标,我们解决了两个关键挑战:(1)关节刚性和非刚性对准;(2)保持旋转的平滑。传统方法在非刚性对齐之前只执行一次刚性对齐。由此产生的旋转可能是次优的,随后的非刚性对齐可能会引入不必要的扭曲。此外,单位球上常用的速度编码方案在速度空间平滑后往往不能保留旋转分量。为了解决这些问题,我们提出了一个将速度场的球谐分解与一种新的速度编码方案相结合的微分同构框架。spham - reg优化了速度场的谐波分量,使刚性和非刚性对准的联合调整成为可能。此外,所提出的使用球面函数的编码方案鼓励保持旋转分量的一致平滑。在实验中,我们在健康成人数据集上验证了spham - reg。与现有方法相比,spham - reg实现了大幅减少翘曲失真,同时保持高水平的配准精度。在临床分析中,我们发现翘曲变形的程度显著影响统计学意义。
{"title":"SPHARM-Reg: Unsupervised Cortical Surface Registration Using Spherical Harmonics","authors":"Seungeun Lee;Seunghwan Lee;Sunghwa Ryu;Ilwoo Lyu","doi":"10.1109/TMI.2025.3581605","DOIUrl":"10.1109/TMI.2025.3581605","url":null,"abstract":"We present a novel learning-based spherical registration method, called SPHARM-Reg, tailored for establishing cortical shape correspondence. SPHARM-Reg aims to reduce warp distortion that can introduce biases in downstream shape analyses. To achieve this, we tackle two critical challenges: (1) joint rigid and non-rigid alignments and (2) rotation-preserving smoothing. Conventional approaches perform rigid alignment only once before a non-rigid alignment. The resulting rotation is potentially sub-optimal, and the subsequent non-rigid alignment may introduce unnecessary distortion. In addition, common velocity encoding schemes on the unit sphere often fail to preserve the rotation component after spatial smoothing of velocity. To address these issues, we propose a diffeomorphic framework that integrates spherical harmonic decomposition of the velocity field with a novel velocity encoding scheme. SPHARM-Reg optimizes harmonic components of the velocity field, enabling joint adjustments for both rigid and non-rigid alignments. Furthermore, the proposed encoding scheme using spherical functions encourages consistent smoothing that preserves the rotation component. In the experiments, we validate SPHARM-Reg on healthy adult datasets. SPHARM-Reg achieves a substantial reduction in warp distortion while maintaining a high level of registration accuracy compared to existing methods. In the clinical analysis, we show that the extent of warp distortion significantly impacts statistical significance.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4732-4742"},"PeriodicalIF":0.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PEPC-Net: Progressive Edge Perception and Completion Network for Precise Identification of Safe Resection Margins in Maxillofacial Cysts PEPC-Net:用于颌面部囊肿安全切除边缘精确识别的渐进式边缘感知和完成网络
Pub Date : 2025-06-19 DOI: 10.1109/TMI.2025.3581200
Nuo Tong;Yuanlin Liu;Yueheng Ding;Tao Wang;Lingnan Hou;Mei Shi;Xiaoyi Hu;Shuiping Gou
Maxillofacial cysts pose significant surgical risks due to their proximity to critical anatomical structures, such as blood vessels and nerves. Precise identification of the safe resection margins is essential for complete lesion removal while minimizing damage to surrounding at-risk tissues, which highly relies on accurate segmentation in CT images. However, due to the limited space and complex anatomical structures in the maxillofacial region, along with heterogeneous compositions of bone and soft tissues, accurate segmentation is extremely challenging. Thus, a Progressive Edge Perception and Completion Network (PEPC-Net) is presented in this study, which integrates three novel components: 1) Progressive Edge Perception Branch, which progressively fuses semantic features from multiple resolution levels in a dual-stream manner, enabling the model to handle the varying forms of maxillofacial cysts at different stages. 2) Edge Information Completion Module, which captures subtle, differentiated edge features from adjacent layers within the encoding blocks, providing more comprehensive edge information for identifying heterogeneous boundaries. 3) Edge-Aware Skip Connection to adaptively fuse multi-scale edge features, preserving detailed edge information, to facilitate precise identification of the cyst boundaries. Extensive experiments on clinically collected maxillofacial lesion datasets validate the effectiveness of the proposed PEPC-Net, achieving a DSC of 88.71% and an ASD of 0.489mm. It’s generalizability is further assessed using an external validation set, which includes more diverse range of maxillofacial cyst cases and images of varying qualities. These experiments highlight the superior performance of PEPC-Net in delineating the polymorphic edges of heterogeneous lesions, which is critical for safe resection margins decision.
由于颌面部囊肿靠近血管和神经等关键解剖结构,因此具有重大的手术风险。准确识别安全切除边缘对于完全切除病灶,同时最大限度地减少对周围危险组织的损害至关重要,这高度依赖于CT图像的准确分割。然而,由于颌面区域空间有限,解剖结构复杂,骨组织和软组织组成不均,准确分割极具挑战性。为此,本研究提出了一个渐进式边缘感知和补全网络(PEPC-Net),该网络集成了三个新组件:1)渐进式边缘感知分支,该分支以双流方式逐步融合多分辨率水平的语义特征,使模型能够处理不同阶段不同形式的颌面部囊肿。2)边缘信息补全模块,从编码块内相邻层捕获细微的、差异化的边缘特征,为识别异构边界提供更全面的边缘信息。3) edge - aware Skip Connection自适应融合多尺度边缘特征,保留详细的边缘信息,便于精确识别囊肿边界。在临床采集的颌面部病变数据集上进行的大量实验验证了PEPC-Net的有效性,DSC为88.71%,ASD为0.489mm。使用外部验证集进一步评估其泛化性,该验证集包括更多样化的颌面部囊肿病例和不同质量的图像。这些实验突出了PEPC-Net在描绘异质病变的多态边缘方面的优越性能,这对安全切除边缘的决定至关重要。
{"title":"PEPC-Net: Progressive Edge Perception and Completion Network for Precise Identification of Safe Resection Margins in Maxillofacial Cysts","authors":"Nuo Tong;Yuanlin Liu;Yueheng Ding;Tao Wang;Lingnan Hou;Mei Shi;Xiaoyi Hu;Shuiping Gou","doi":"10.1109/TMI.2025.3581200","DOIUrl":"10.1109/TMI.2025.3581200","url":null,"abstract":"Maxillofacial cysts pose significant surgical risks due to their proximity to critical anatomical structures, such as blood vessels and nerves. Precise identification of the safe resection margins is essential for complete lesion removal while minimizing damage to surrounding at-risk tissues, which highly relies on accurate segmentation in CT images. However, due to the limited space and complex anatomical structures in the maxillofacial region, along with heterogeneous compositions of bone and soft tissues, accurate segmentation is extremely challenging. Thus, a Progressive Edge Perception and Completion Network (PEPC-Net) is presented in this study, which integrates three novel components: 1) Progressive Edge Perception Branch, which progressively fuses semantic features from multiple resolution levels in a dual-stream manner, enabling the model to handle the varying forms of maxillofacial cysts at different stages. 2) Edge Information Completion Module, which captures subtle, differentiated edge features from adjacent layers within the encoding blocks, providing more comprehensive edge information for identifying heterogeneous boundaries. 3) Edge-Aware Skip Connection to adaptively fuse multi-scale edge features, preserving detailed edge information, to facilitate precise identification of the cyst boundaries. Extensive experiments on clinically collected maxillofacial lesion datasets validate the effectiveness of the proposed PEPC-Net, achieving a DSC of 88.71% and an ASD of 0.489mm. It’s generalizability is further assessed using an external validation set, which includes more diverse range of maxillofacial cyst cases and images of varying qualities. These experiments highlight the superior performance of PEPC-Net in delineating the polymorphic edges of heterogeneous lesions, which is critical for safe resection margins decision.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4704-4716"},"PeriodicalIF":0.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis EchoFM:可推广超声心动图分析的基础模型
Pub Date : 2025-06-18 DOI: 10.1109/TMI.2025.3580713
Sekeun Kim;Pengfei Jin;Sifan Song;Cheng Chen;Yiwei Li;Hui Ren;Xiang Li;Tianming Liu;Quanzheng Li
Echocardiography is the first-line non-invasive cardiac imaging modality, providing rich spatio-temporal information on cardiac anatomy and physiology. Recently, foundation model trained on extensive and diverse datasets has shown strong performance in various downstream tasks. However, translating foundation models into the medical imaging domain remains challenging due to domain differences between medical and natural images, the lack of diverse patient and disease datasets. In this paper, we introduce EchoFM, a general-purpose vision foundation model for echocardiography trained on a large-scale dataset of over 20 million echocardiographic images from 6,500 patients. To enable effective learning of rich spatio-temporal representations from periodic videos, we propose a novel self-supervised learning framework based on a masked autoencoder with a spatio-temporal consistent masking strategy and periodic-driven contrastive learning. The learned cardiac representations can be readily adapted and fine-tuned for a wide range of downstream tasks, serving as a strong and flexible backbone model. We validate EchoFM through experiments across key downstream tasks in the clinical echocardiography workflow, leveraging public and multi-center internal datasets. EchoFM consistently outperforms SOTA methods, demonstrating superior generalization capabilities and flexibility. The code and checkpoints are available at: https://github.com/SekeunKim/EchoFM.git
超声心动图是一线的无创心脏成像方式,可提供丰富的心脏解剖和生理时空信息。近年来,在广泛而多样的数据集上训练的基础模型在各种下游任务中表现出了较强的性能。然而,由于医学图像和自然图像之间的领域差异,以及缺乏多样化的患者和疾病数据集,将基础模型转换到医学成像领域仍然具有挑战性。在本文中,我们介绍了EchoFM,这是一个通用的超声心动图视觉基础模型,该模型是在来自6500名患者的2000多万张超声心动图图像的大规模数据集上训练的。为了能够有效地从周期视频中学习丰富的时空表征,我们提出了一种基于掩蔽自编码器的自监督学习框架,该框架具有时空一致掩蔽策略和周期驱动的对比学习。学习到的心脏表征可以很容易地适应和微调,以适应广泛的下游任务,作为一个强大而灵活的骨干模型。我们利用公共和多中心内部数据集,通过临床超声心动图工作流程中关键下游任务的实验验证了EchoFM。EchoFM始终优于SOTA方法,展示了卓越的泛化能力和灵活性。代码和检查点可从https://github.com/SekeunKim/EchoFM.git获得
{"title":"EchoFM: Foundation Model for Generalizable Echocardiogram Analysis","authors":"Sekeun Kim;Pengfei Jin;Sifan Song;Cheng Chen;Yiwei Li;Hui Ren;Xiang Li;Tianming Liu;Quanzheng Li","doi":"10.1109/TMI.2025.3580713","DOIUrl":"10.1109/TMI.2025.3580713","url":null,"abstract":"Echocardiography is the first-line non-invasive cardiac imaging modality, providing rich spatio-temporal information on cardiac anatomy and physiology. Recently, foundation model trained on extensive and diverse datasets has shown strong performance in various downstream tasks. However, translating foundation models into the medical imaging domain remains challenging due to domain differences between medical and natural images, the lack of diverse patient and disease datasets. In this paper, we introduce EchoFM, a general-purpose vision foundation model for echocardiography trained on a large-scale dataset of over 20 million echocardiographic images from 6,500 patients. To enable effective learning of rich spatio-temporal representations from periodic videos, we propose a novel self-supervised learning framework based on a masked autoencoder with a spatio-temporal consistent masking strategy and periodic-driven contrastive learning. The learned cardiac representations can be readily adapted and fine-tuned for a wide range of downstream tasks, serving as a strong and flexible backbone model. We validate EchoFM through experiments across key downstream tasks in the clinical echocardiography workflow, leveraging public and multi-center internal datasets. EchoFM consistently outperforms SOTA methods, demonstrating superior generalization capabilities and flexibility. The code and checkpoints are available at: <uri>https://github.com/SekeunKim/EchoFM.git</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 10","pages":"4049-4062"},"PeriodicalIF":0.0,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144319677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Distance-Color-Coded Assessment of PCI Stent Apposition via Deep-Learning-Based Three-Dimensional Multi-Object Segmentation 基于深度学习的三维多目标分割的PCI支架放置的三维距离颜色编码评估
Pub Date : 2025-06-17 DOI: 10.1109/TMI.2025.3580619
Xiaoyang Qin;Hao Huang;Shuaichen Lin;Xinhao Zeng;Kaizhi Cao;Renxiong Wu;Yuming Huang;Junqing Yang;Yong Liu;Gang Li;Guangming Ni
Coronary artery disease poses a significant global health challenge, often necessitating percutaneous coronary intervention (PCI) with stent implantation. Assessing stent apposition is crucial for preventing and identifying PCI complications leading to in-stent restenosis. Here we propose a novel three-dimensional (3D) distancecolor-coded assessment (DccA) for PCI stent apposition via deep-learning-based 3D multi-object segmentation in intravascular optical coherence tomography (IV-OCT). Our proposed 3D DccA accurately segments 3D vessel lumens and stents in IV-OCT images, using a hybrid-dimensional spatial matching network and dual-layer training with style transfer. It quantifies and maps stent-lumen distances into a 3D color space, achieving a 3D visual assessment of PCI stent apposition. Achieving over 95% segmentation precision for both stent struts and the lumen and having 3D color visualization, our proposed 3D DccA improves the clinical evaluation of PCI stent deployment and facilitates personalized treatment planning.
冠状动脉疾病对全球健康构成重大挑战,经常需要经皮冠状动脉介入治疗(PCI)和支架植入术。评估支架放置对预防和识别PCI并发症导致支架内再狭窄至关重要。在这里,我们提出了一种新的三维(3D)距离颜色编码评估(DccA),通过血管内光学相干断层扫描(IV-OCT)中基于深度学习的三维多目标分割来进行PCI支架放置。我们提出的3D DccA使用混合维空间匹配网络和带风格迁移的双层训练,准确分割IV-OCT图像中的3D血管管腔和支架。它将支架-流明距离量化并映射到3D色彩空间,实现PCI支架放置的3D视觉评估。我们提出的3D DccA对支架支撑物和管腔的分割精度均达到95%以上,并具有3D颜色可视化,改善了PCI支架部署的临床评估,便于个性化治疗计划。
{"title":"3D Distance-Color-Coded Assessment of PCI Stent Apposition via Deep-Learning-Based Three-Dimensional Multi-Object Segmentation","authors":"Xiaoyang Qin;Hao Huang;Shuaichen Lin;Xinhao Zeng;Kaizhi Cao;Renxiong Wu;Yuming Huang;Junqing Yang;Yong Liu;Gang Li;Guangming Ni","doi":"10.1109/TMI.2025.3580619","DOIUrl":"10.1109/TMI.2025.3580619","url":null,"abstract":"Coronary artery disease poses a significant global health challenge, often necessitating percutaneous coronary intervention (PCI) with stent implantation. Assessing stent apposition is crucial for preventing and identifying PCI complications leading to in-stent restenosis. Here we propose a novel three-dimensional (3D) distancecolor-coded assessment (DccA) for PCI stent apposition via deep-learning-based 3D multi-object segmentation in intravascular optical coherence tomography (IV-OCT). Our proposed 3D DccA accurately segments 3D vessel lumens and stents in IV-OCT images, using a hybrid-dimensional spatial matching network and dual-layer training with style transfer. It quantifies and maps stent-lumen distances into a 3D color space, achieving a 3D visual assessment of PCI stent apposition. Achieving over 95% segmentation precision for both stent struts and the lumen and having 3D color visualization, our proposed 3D DccA improves the clinical evaluation of PCI stent deployment and facilitates personalized treatment planning.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4717-4731"},"PeriodicalIF":0.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robotic Optical Coherence Tomography With Expanded Three-Dimensional Field-of-View Using Point-Cloud-Based Volumetric Montaging 使用基于点云的体积蒙太奇的扩展三维视场的机器人光学相干层析成像
Pub Date : 2025-06-17 DOI: 10.1109/TMI.2025.3580383
Raymond Fang;Pengpeng Zhang;Tingwei Zhang;Zihang Yan;Daniel Kim;Edison Sun;Roman Kuranov;Junghun Kweon;Alex S. Huang;Hao F. Zhang
Imaging complex, non-planar anatomies with optical coherence tomography (OCT) is limited by the optical field of view (FOV) in a single volumetric acquisition. Combining linear mechanical translation with OCT extends the FOV but suffers from inflexibility in imaging non-planar anatomies. We report the robotic OCT to fill this gap. To address challenges in volumetric reconstruction associated with the robotic movement accuracy being two orders of magnitudes worse than OCT imaging resolution, we developed a volumetric montaging algorithm. To test the robotic OCT, we imaged the entire circumferential aqueous humor outflow pathway, whose imaging has the potential to customize glaucoma surgeries but is typically constrained by the FOV in mice in vivo. We acquired volumetric OCT data at different robotic poses and reconstructed the entire anterior segment of the eye. From the segmented Schlemm’s canal volume, we showed its circumferentially heterogeneous morphology; we also revealed a segmental nature in the circumferential distribution of collector channels with spatial features as small as a few micrometers.
光学相干断层扫描(OCT)成像复杂的非平面解剖结构在单个体积采集中受到光学视场(FOV)的限制。线性机械平移与OCT的结合扩大了视场,但在成像非平面解剖时缺乏灵活性。我们报告了机器人OCT来填补这一空白。为了解决与机器人运动精度比OCT成像分辨率差两个数量级的体积重建相关的挑战,我们开发了一种体积蒙太奇算法。为了测试机器人OCT,我们对整个环形房水流出通道进行了成像,其成像具有定制青光眼手术的潜力,但通常受到小鼠体内视场的限制。我们获得了不同机器人姿态下的体积OCT数据,并重建了眼睛的整个前段。从分节的Schlemm管体积来看,我们显示了它的周向异质性形态;我们还揭示了集热器通道周向分布的节段性,其空间特征小至几微米。
{"title":"Robotic Optical Coherence Tomography With Expanded Three-Dimensional Field-of-View Using Point-Cloud-Based Volumetric Montaging","authors":"Raymond Fang;Pengpeng Zhang;Tingwei Zhang;Zihang Yan;Daniel Kim;Edison Sun;Roman Kuranov;Junghun Kweon;Alex S. Huang;Hao F. Zhang","doi":"10.1109/TMI.2025.3580383","DOIUrl":"10.1109/TMI.2025.3580383","url":null,"abstract":"Imaging complex, non-planar anatomies with optical coherence tomography (OCT) is limited by the optical field of view (FOV) in a single volumetric acquisition. Combining linear mechanical translation with OCT extends the FOV but suffers from inflexibility in imaging non-planar anatomies. We report the robotic OCT to fill this gap. To address challenges in volumetric reconstruction associated with the robotic movement accuracy being two orders of magnitudes worse than OCT imaging resolution, we developed a volumetric montaging algorithm. To test the robotic OCT, we imaged the entire circumferential aqueous humor outflow pathway, whose imaging has the potential to customize glaucoma surgeries but is typically constrained by the FOV in mice in vivo. We acquired volumetric OCT data at different robotic poses and reconstructed the entire anterior segment of the eye. From the segmented Schlemm’s canal volume, we showed its circumferentially heterogeneous morphology; we also revealed a segmental nature in the circumferential distribution of collector channels with spatial features as small as a few micrometers.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4639-4651"},"PeriodicalIF":0.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Semantic Gap in Medical Visual Question Answering With Prompt Learning 用快速学习弥合医学视觉问题回答的语义差距
Pub Date : 2025-06-17 DOI: 10.1109/TMI.2025.3580561
Zilin Lu;Qingjie Zeng;Mengkang Lu;Geng Chen;Yong Xia
Medical Visual Question Answering (Med-VQA) aims to answer questions regarding the content of medical images, crucial for enhancing diagnostics and education in healthcare. However, progress in this field is hindered by data scarcity due to the resource-intensive nature of medical data annotation. While existing Med-VQA approaches often rely on pre-training to mitigate this issue, bridging the semantic gap between pre-trained models and specific tasks remains a significant challenge. This paper presents the Dynamic Semantic-Adaptive Prompting (DSAP) framework, leveraging prompt learning to enhance model performance in Med-VQA. To this end, we introduce two prompting strategies: Semantic Alignment Prompting (SAP) and Dynamic Question-Aware Prompting (DQAP). SAP prompts multi-modal inputs during fine-tuning, reducing the semantic gap by aligning model outputs with domain-specific contexts. Simultaneously, DQAP enhances answer selection by leveraging grammatical relationships between questions and answers, thereby improving accuracy and relevance. The DSAP framework was pre-trained on three datasets—ROCO, MedICaT, and MIMIC-CXR—and comprehensively evaluated against 15 existing Med-VQA models on three public datasets: VQA-RAD, SLAKE, and PathVQA. Our results demonstrate a substantial performance improvement, with DSAP achieving a 1.9% enhancement in average results across benchmarks. These findings underscore DSAP’s effectiveness in addressing critical challenges in Med-VQA and suggest promising avenues for future developments in medical AI.
医学视觉问答(Med-VQA)旨在回答有关医学图像内容的问题,这对于加强医疗保健诊断和教育至关重要。然而,由于医疗数据标注的资源密集性,数据稀缺阻碍了这一领域的进展。虽然现有的Med-VQA方法通常依赖于预训练来缓解这个问题,但弥合预训练模型和特定任务之间的语义差距仍然是一个重大挑战。本文提出了动态语义自适应提示(DSAP)框架,利用提示学习来提高Med-VQA中的模型性能。为此,我们介绍了两种提示策略:语义对齐提示(SAP)和动态问题感知提示(DQAP)。SAP在微调期间提示多模态输入,通过将模型输出与特定于领域的上下文对齐来减少语义差距。同时,DQAP通过利用问题和答案之间的语法关系来增强答案选择,从而提高准确性和相关性。DSAP框架在三个数据集(roco、MedICaT和mimic - cxr)上进行预训练,并在三个公共数据集(VQA-RAD、SLAKE和PathVQA)上对15个现有的Med-VQA模型进行全面评估。我们的结果显示了显著的性能改进,DSAP在基准测试中的平均结果提高了1.9%。这些发现强调了DSAP在应对医学- vqa中的关键挑战方面的有效性,并为医疗人工智能的未来发展提出了有希望的途径。
{"title":"Bridging the Semantic Gap in Medical Visual Question Answering With Prompt Learning","authors":"Zilin Lu;Qingjie Zeng;Mengkang Lu;Geng Chen;Yong Xia","doi":"10.1109/TMI.2025.3580561","DOIUrl":"10.1109/TMI.2025.3580561","url":null,"abstract":"Medical Visual Question Answering (Med-VQA) aims to answer questions regarding the content of medical images, crucial for enhancing diagnostics and education in healthcare. However, progress in this field is hindered by data scarcity due to the resource-intensive nature of medical data annotation. While existing Med-VQA approaches often rely on pre-training to mitigate this issue, bridging the semantic gap between pre-trained models and specific tasks remains a significant challenge. This paper presents the Dynamic Semantic-Adaptive Prompting (DSAP) framework, leveraging prompt learning to enhance model performance in Med-VQA. To this end, we introduce two prompting strategies: Semantic Alignment Prompting (SAP) and Dynamic Question-Aware Prompting (DQAP). SAP prompts multi-modal inputs during fine-tuning, reducing the semantic gap by aligning model outputs with domain-specific contexts. Simultaneously, DQAP enhances answer selection by leveraging grammatical relationships between questions and answers, thereby improving accuracy and relevance. The DSAP framework was pre-trained on three datasets—ROCO, MedICaT, and MIMIC-CXR—and comprehensively evaluated against 15 existing Med-VQA models on three public datasets: VQA-RAD, SLAKE, and PathVQA. Our results demonstrate a substantial performance improvement, with DSAP achieving a 1.9% enhancement in average results across benchmarks. These findings underscore DSAP’s effectiveness in addressing critical challenges in Med-VQA and suggest promising avenues for future developments in medical AI.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4605-4616"},"PeriodicalIF":0.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1