首页 > 最新文献

Medical image analysis最新文献

英文 中文
Reliable uncertainty quantification for 2D/3D anatomical landmark localization using multi-output conformal prediction 使用多输出适形预测进行二维/三维解剖地标定位的可靠不确定性量化
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.media.2026.103953
Jef Jonkers , Frank Coopman , Luc Duchateau , Glenn Van Wallendael , Sofie Van Hoecke
Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: multi-output regression-as-classification conformal prediction (M-R2CCP) and its variant multi-output regression to classification conformal prediction set to region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.
医学成像中的自动解剖地标定位不仅需要准确的预测,还需要可靠的不确定性量化,以提供有效的临床决策支持。当前的不确定性量化方法往往不足,特别是当与正态性假设相结合时,系统地低估了总的预测不确定性。本文引入保形预测作为解剖学地标定位中可靠的不确定性量化框架,解决了自动地标定位的关键空白。提出了两种保证多输出预测有限样本有效性的新方法:多输出回归作为分类适形预测(M-R2CCP)及其变体多输出回归到分类适形预测集到区域(M-R2C2R)。与产生轴向超矩形或椭球体区域的传统方法不同,我们的方法产生灵活的非凸预测区域,可以更好地捕获里程碑预测的潜在不确定性结构。通过对多个2D和3D数据集的广泛经验评估,我们证明了我们的方法在有效性和效率方面始终优于现有的多输出适形预测方法。这项工作代表了解剖地标定位可靠不确定性估计的重大进展,为临床医生的诊断提供了值得信赖的信心措施。虽然这些方法是为医学成像而开发的,但它们在多输出回归问题中有更广泛的应用前景。
{"title":"Reliable uncertainty quantification for 2D/3D anatomical landmark localization using multi-output conformal prediction","authors":"Jef Jonkers ,&nbsp;Frank Coopman ,&nbsp;Luc Duchateau ,&nbsp;Glenn Van Wallendael ,&nbsp;Sofie Van Hoecke","doi":"10.1016/j.media.2026.103953","DOIUrl":"10.1016/j.media.2026.103953","url":null,"abstract":"<div><div>Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: multi-output regression-as-classification conformal prediction (M-R2CCP) and its variant multi-output regression to classification conformal prediction set to region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103953"},"PeriodicalIF":11.8,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIL-Adapter: Coupling multiple instance learning and vision-language adapters for few-shot slide-level classification MIL-Adapter:耦合多实例学习和视觉语言适配器,用于少量幻灯片级别的分类
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-26 DOI: 10.1016/j.media.2026.103964
Pablo Meseguer , Rocío del Amor , Valery Naranjo
Contrastive language-image pretraining has greatly enhanced visual representation learning and enabled zero-shot classification. Vision-language language models (VLM) have succeeded in few-shot learning by leveraging adaptation modules fine-tuned for specific downstream tasks. In computational pathology (CPath), accurate whole-slide image (WSI) prediction is crucial for aiding in cancer diagnosis, and multiple instance learning (MIL) remains essential for managing the gigapixel scale of WSIs. In the intersection between CPath and VLMs, the literature still lacks specific adapters that handle the particular complexity of the slides. To solve this gap, we introduce MIL-Adapter, a novel approach designed to obtain consistent slide-level classification under few-shot learning scenarios. In particular, our framework is the first to combine trainable MIL aggregation functions and lightweight visual-language adapters to improve the performance of histopathological VLMs. MIL-Adapter relies on textual ensemble learning to construct discriminative zero-shot prototypes. It is serves as a solid starting point, surpassing MIL models with randomly initialized classifiers in data-constrained settings. With our experimentation, we demonstrate the value of textual ensemble learning and the robust predictive performance of MIL-Adapter through diverse datasets and configurations of few-shot scenarios, while providing crucial insights on model interpretability. The code is publicly accessible in https://github.com/cvblab/MIL-Adapter.
对比语言图像预训练极大地增强了视觉表征学习,实现了零采样分类。视觉语言模型(VLM)通过利用针对特定下游任务进行微调的自适应模块,在少量学习中取得了成功。在计算病理学(CPath)中,准确的全片图像(WSI)预测对于帮助癌症诊断至关重要,而多实例学习(MIL)对于管理WSI的十亿像素规模仍然至关重要。在CPath和vlm之间的交集中,文献仍然缺乏处理幻灯片特定复杂性的特定适配器。为了解决这一差距,我们引入了MIL-Adapter,这是一种新的方法,旨在在少镜头学习场景下获得一致的幻灯片级别分类。特别是,我们的框架是第一个将可训练的MIL聚合功能和轻量级的视觉语言适配器结合起来以提高组织病理学vlm的性能的框架。MIL-Adapter依赖于文本集成学习来构建判别零射击原型。它可以作为一个坚实的起点,在数据约束设置中超越具有随机初始化分类器的MIL模型。通过实验,我们展示了文本集成学习的价值以及MIL-Adapter通过不同数据集和少量场景配置的鲁棒预测性能,同时提供了关于模型可解释性的重要见解。该代码可在https://github.com/cvblab/MIL-Adapter上公开访问。
{"title":"MIL-Adapter: Coupling multiple instance learning and vision-language adapters for few-shot slide-level classification","authors":"Pablo Meseguer ,&nbsp;Rocío del Amor ,&nbsp;Valery Naranjo","doi":"10.1016/j.media.2026.103964","DOIUrl":"10.1016/j.media.2026.103964","url":null,"abstract":"<div><div>Contrastive language-image pretraining has greatly enhanced visual representation learning and enabled zero-shot classification. Vision-language language models (VLM) have succeeded in few-shot learning by leveraging adaptation modules fine-tuned for specific downstream tasks. In computational pathology (CPath), accurate whole-slide image (WSI) prediction is crucial for aiding in cancer diagnosis, and multiple instance learning (MIL) remains essential for managing the gigapixel scale of WSIs. In the intersection between CPath and VLMs, the literature still lacks specific adapters that handle the particular complexity of the slides. To solve this gap, we introduce MIL-Adapter, a novel approach designed to obtain consistent slide-level classification under few-shot learning scenarios. In particular, our framework is the first to combine trainable MIL aggregation functions and lightweight visual-language adapters to improve the performance of histopathological VLMs. MIL-Adapter relies on textual ensemble learning to construct discriminative zero-shot prototypes. It is serves as a solid starting point, surpassing MIL models with randomly initialized classifiers in data-constrained settings. With our experimentation, we demonstrate the value of textual ensemble learning and the robust predictive performance of MIL-Adapter through diverse datasets and configurations of few-shot scenarios, while providing crucial insights on model interpretability. The code is publicly accessible in <span><span>https://github.com/cvblab/MIL-Adapter</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103964"},"PeriodicalIF":11.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Boundary Confusion for Volumetric Medical Image Segmentation 体医学图像分割中边界混淆的研究
IF 10.9 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.media.2026.103961
Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Junyang Wu, Yi Yu, Jie Yang, Yun Gu
Accurate boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention. It is challenging to address the boundary confusion with explicit constraints. Existing methods of refining boundaries overemphasize the slender structure while overlooking the dynamic interactions between boundaries and neighboring regions. In this paper, we reconceptualize the mechanism of boundary generation via introducing Pushing and Pulling interactions, then propose a unified network termed PP-Net to model shape characteristics of the confused boundary region. Specifically, we first propose the semantic difference module (SDM) from the pushing branch to drive the boundary towards the ground truth under diffusion guidance. Additionally, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary along the opposite direction. Thus, pushing and pulling branches will furnish two adversarial forces to enhance representation capabilities for the faint boundary. Experiments are conducted on four public datasets and one in-house dataset plagued by boundary confusion. The results demonstrate the superiority of PP-Net over other segmentation networks, especially on the evaluation metrics of Hausdorff Distance and Average Symmetric Surface Distance. Besides, SDM and CCM can serve as plug-and-play modules to enhance classic U-shape baseline models, including recent SAM-based foundation models. Source codes are available at https://github.com/EndoluminalSurgicalVision-IMR/PnPNet.
体积图像的精确边界分割是图像引导诊断和计算机辅助干预的关键任务。用明确的约束来解决边界混淆是具有挑战性的。现有的边界细化方法过分强调细长的结构,而忽略了边界与邻近区域之间的动态相互作用。本文通过引入推拉相互作用,重新定义了边界生成的机制,并提出了一个统一的PP-Net网络来模拟混乱边界区域的形状特征。具体而言,我们首先提出了推枝的语义差分模块(SDM),在扩散引导下将边界向地面真值驱动。此外,引入了来自拉分支的类聚类模块(CCM),将相交边界沿相反方向拉伸。因此,推和拉分支将提供两种对立的力量,以增强对模糊边界的表示能力。在四个公共数据集和一个边界混乱的内部数据集上进行了实验。结果表明,PP-Net在Hausdorff距离和平均对称表面距离的评价指标上优于其他分割网络。此外,SDM和CCM可以作为即插即用模块来增强经典的u型基线模型,包括最近基于sam的基础模型。源代码可从https://github.com/EndoluminalSurgicalVision-IMR/PnPNet获得。
{"title":"Towards Boundary Confusion for Volumetric Medical Image Segmentation","authors":"Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Junyang Wu, Yi Yu, Jie Yang, Yun Gu","doi":"10.1016/j.media.2026.103961","DOIUrl":"https://doi.org/10.1016/j.media.2026.103961","url":null,"abstract":"Accurate boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention. It is challenging to address the boundary confusion with explicit constraints. Existing methods of refining boundaries overemphasize the slender structure while overlooking the dynamic interactions between boundaries and neighboring regions. In this paper, we reconceptualize the mechanism of boundary generation via introducing Pushing and Pulling interactions, then propose a unified network termed PP-Net to model shape characteristics of the confused boundary region. Specifically, we first propose the semantic difference module (SDM) from the pushing branch to drive the boundary towards the ground truth under diffusion guidance. Additionally, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary along the opposite direction. Thus, pushing and pulling branches will furnish two adversarial forces to enhance representation capabilities for the faint boundary. Experiments are conducted on four public datasets and one in-house dataset plagued by boundary confusion. The results demonstrate the superiority of PP-Net over other segmentation networks, especially on the evaluation metrics of Hausdorff Distance and Average Symmetric Surface Distance. Besides, SDM and CCM can serve as plug-and-play modules to enhance classic U-shape baseline models, including recent SAM-based foundation models. Source codes are available at <ce:inter-ref xlink:href=\"https://github.com/EndoluminalSurgicalVision-IMR/PnPNet\" xlink:type=\"simple\">https://github.com/EndoluminalSurgicalVision-IMR/PnPNet</ce:inter-ref>.","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"291 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fundus image quality assessment in retinopathy of prematurity via multi-label graph evidential network 基于多标签图证据网络的早产儿视网膜病变眼底图像质量评价
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.media.2026.103959
Donghan Wu , Wenyue Shen , Lu Yuan , Heng Li , Huaying Hao , Juan Ye , Yitian Zhao
Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.
早产儿视网膜病变(ROP)是全球儿童失明的主要原因。在临床实践中,眼底成像是ROP的主要诊断工具,因此对这些图像的准确质量评估至关重要。然而,现有的眼底图像ROP评估自动化方法面临着重大挑战。首先,在病变和影响质量的因素之间存在高度的视觉相似性。其次,缺乏可信赖的输出和可解释或临床友好的设计,这限制了它们的可靠性和有效性。在这项工作中,我们提出了一个ROP图像质量评估框架,称为Q-ROP。该框架利用基于关键图像因素(如工件、照明、空间定位和结构清晰度)的细粒度多标签注释。此外,将标签图网络与证据学习理论相结合,使模型能够明确地捕捉质量等级与影响因素之间的关系,从而提高鲁棒性和准确性。这种方法通过将模型的焦点指向相关的图像特征并减少来自类似病变的工件的干扰,从而促进了可解释的分析。此外,证据学习理论的结合有助于量化质量评级固有的不确定性,从而确保评估的可信度。在三个质量水平(即可接受、潜在可接受和不可接受)的6677张ROP图像数据集上进行训练和测试,Q-ROP达到了最先进的性能,准确率为95.82%。在下游ROP分期任务中进一步验证了其有效性,该方法显著提高了典型分类模型的性能。这些结果表明Q-ROP作为临床决策支持的可靠和强大的工具具有强大的潜力。
{"title":"Fundus image quality assessment in retinopathy of prematurity via multi-label graph evidential network","authors":"Donghan Wu ,&nbsp;Wenyue Shen ,&nbsp;Lu Yuan ,&nbsp;Heng Li ,&nbsp;Huaying Hao ,&nbsp;Juan Ye ,&nbsp;Yitian Zhao","doi":"10.1016/j.media.2026.103959","DOIUrl":"10.1016/j.media.2026.103959","url":null,"abstract":"<div><div>Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103959"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fréchet radiomic distance (FRD): A versatile metric for comparing medical imaging datasets ferccheradiomic Distance (FRD):一种比较医学影像数据集的通用度量
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.media.2026.103943
Nicholas Konz , Richard Osuala , Preeti Verma , Yuwen Chen , Hanxue Gu , Haoyu Dong , Yaqian Chen , Andrew Marshall , Lidia Garrucho , Kaisar Kushibar , Daniel M. Lang , Gene S. Kim , Lars J. Grimm , John M. Lewin , James S. Duncan , Julia A. Schnabel , Oliver Diaz , Karim Lekadir , Maciej A. Mazurowski
Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging—including the first large-scale comparative study of generative models for medical image translation—and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.
确定两组图像是否属于相同或不同的分布或域是现代医学图像分析和深度学习的关键任务;例如,评估图像生成模型的输出质量。目前,用于该任务的度量要么依赖于(可能有偏见的)一些下游任务的选择,例如分割,要么采用与任务无关的感知度量(例如,fr起始距离/FID)来自自然成像,我们认为这不足以捕获解剖特征。为此,我们引入了一种为医学图像量身定制的新的感知度量,FRD (fr放射距离),它利用标准化、临床意义和可解释的图像特征。研究表明,在一系列医学成像应用中,FRD优于其他图像分布指标,包括域外(OOD)检测、图像到图像转换的评估(通过更多地与下游任务性能以及解剖一致性和真实感相关)以及无条件图像生成的评估。此外,FRD还提供了额外的好处,如低样本量下的稳定性和计算效率、对图像损坏和对抗性攻击的敏感性、特征可解释性以及与放射科医生感知的图像质量的相关性。此外,我们通过提出医学成像中图像相似性度量的多方面评估的广泛框架(包括医学图像翻译生成模型的首次大规模比较研究)来解决文献中的关键空白,并发布了一个可访问的代码库,以促进未来的研究。我们的结果得到了跨越各种数据集、模式和下游任务的全面实验的支持,突出了FRD在医学图像分析方面的广泛潜力。
{"title":"Fréchet radiomic distance (FRD): A versatile metric for comparing medical imaging datasets","authors":"Nicholas Konz ,&nbsp;Richard Osuala ,&nbsp;Preeti Verma ,&nbsp;Yuwen Chen ,&nbsp;Hanxue Gu ,&nbsp;Haoyu Dong ,&nbsp;Yaqian Chen ,&nbsp;Andrew Marshall ,&nbsp;Lidia Garrucho ,&nbsp;Kaisar Kushibar ,&nbsp;Daniel M. Lang ,&nbsp;Gene S. Kim ,&nbsp;Lars J. Grimm ,&nbsp;John M. Lewin ,&nbsp;James S. Duncan ,&nbsp;Julia A. Schnabel ,&nbsp;Oliver Diaz ,&nbsp;Karim Lekadir ,&nbsp;Maciej A. Mazurowski","doi":"10.1016/j.media.2026.103943","DOIUrl":"10.1016/j.media.2026.103943","url":null,"abstract":"<div><div>Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (<em>e.g.</em>, Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging—including the first large-scale comparative study of generative models for medical image translation—and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103943"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MADAT: Missing-aware dynamic adaptive transformer model for medical prognosis prediction with incomplete multimodal data 不完全多模态数据下医疗预后预测的缺失感知动态自适应变压器模型
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.media.2026.103958
Jianbin He , Guoheng Huang , Xiaochen Yuan , Chi-Man Pun , Guo Zhong , Qi Yang , Ling Guo , Siyu Zhu , Baiying Lei , Haojiang Li
Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.
多模式医学预后预测通过整合多种数据类型,在提高诊断准确性方面显示出巨大的潜力。然而,不完整的多模态,其中某些模态缺失,对模型性能提出了重大挑战。现有的动态自适应和模态补全方法在综合处理不完全多模态方面存在局限性。动态适应方法只处理可用的模态,不能充分利用模态相互作用。模态补全方法处理的是多模态关系,但有产生不可靠数据的风险,特别是当关键模态缺失时,因为现有模态无法复制缺失模态的独特特征。这损害了融合质量并降低了模型性能。为了解决这些挑战,我们提出了缺失感知动态自适应变压器(MADAT)模型,该模型集成了两个阶段:解耦推广完成阶段(DGCP)和自适应交叉融合阶段(ACFP)。DGCP通过使用渐进式变换递归门控卷积(PTRGC)和小波对齐域泛化(WADG)生成模态间和模态内共享信息来重建缺失模态。ACFP结合了跨代理注意(CAA)和发电质量反馈调节(GQFR),自适应地融合了原始和生成的模态特征。CAA确保特征的彻底集成和对齐,而GQFR根据生成的特征的质量动态调整模型对其的依赖,防止对低质量数据的过度依赖。在三个私人鼻咽癌数据集上的实验表明,MADAT优于现有方法,在不完全多模态条件下的医学多模态预测具有优越的鲁棒性。
{"title":"MADAT: Missing-aware dynamic adaptive transformer model for medical prognosis prediction with incomplete multimodal data","authors":"Jianbin He ,&nbsp;Guoheng Huang ,&nbsp;Xiaochen Yuan ,&nbsp;Chi-Man Pun ,&nbsp;Guo Zhong ,&nbsp;Qi Yang ,&nbsp;Ling Guo ,&nbsp;Siyu Zhu ,&nbsp;Baiying Lei ,&nbsp;Haojiang Li","doi":"10.1016/j.media.2026.103958","DOIUrl":"10.1016/j.media.2026.103958","url":null,"abstract":"<div><div>Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103958"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IUGC: A benchmark of landmark detection in end-to-end intrapartum ultrasound biometry IUGC:端到端分娩时超声生物测量的地标检测基准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.media.2026.103960
Jieyun Bai , Yitong Tang , Xiao Liu , Jiale Hu , Yunda Li , Xufan Chen , Yufeng Wang , Chen Ma , Yunshu Li , Bowen Guo , Jing Jiao , Yi Huang , Kun Wang , Lifei Li , Yuzhang Ma , Xiaoxin Han , Haochen Shao , Zi Yang , Qingchen Liu , Yuchen Hu , Shuo Li
Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.
准确的产时生物测量在监测产程和预防并发症中起着至关重要的作用。然而,它的临床应用受到诸如难以识别解剖标志和操作员依赖性引入的可变性等挑战的限制。为了克服这些挑战,在医学图像计算和计算机辅助干预国际会议(MICCAI)的合作下,组织了2025年产房超声大挑战(IUGC),以加速产房超声分析自动测量技术的发展。该挑战的特点是一个大规模的多中心数据集,包括来自24家医院和研究机构的32,000多张图像。这些图像标注了耻骨联合(PS)和胎头(FH)的关键解剖标志,以及相应的生物特征参数-进展角(AoP)。十个参与团队提出了各种端到端和半监督框架,结合了先进的策略,如基础模型蒸馏、伪标签细化、解剖分割指导和集成学习。综合评估显示,获胜团队取得了卓越的准确性,右侧PS标志的平均径向误差(MRE)为6.53±4.38像素,左侧PS标志为8.60±5.06像素,FH切线标志为19.90±17.55像素,绝对AoP差为3.81±3.12°。该方法表现出与专家超声仪相当的准确性,强调了自动产时超声分析的临床潜力。然而,挑战仍然存在,例如准确性和计算效率之间的权衡,缺乏分割标签和视频数据,以及需要广泛的多中心临床验证。因此,IUGC 2025为基于里程碑的产时生物测量估计设定了第一个基准,并为开发和评估用于劳动管理的实时智能超声分析解决方案提供了一个开放平台。
{"title":"IUGC: A benchmark of landmark detection in end-to-end intrapartum ultrasound biometry","authors":"Jieyun Bai ,&nbsp;Yitong Tang ,&nbsp;Xiao Liu ,&nbsp;Jiale Hu ,&nbsp;Yunda Li ,&nbsp;Xufan Chen ,&nbsp;Yufeng Wang ,&nbsp;Chen Ma ,&nbsp;Yunshu Li ,&nbsp;Bowen Guo ,&nbsp;Jing Jiao ,&nbsp;Yi Huang ,&nbsp;Kun Wang ,&nbsp;Lifei Li ,&nbsp;Yuzhang Ma ,&nbsp;Xiaoxin Han ,&nbsp;Haochen Shao ,&nbsp;Zi Yang ,&nbsp;Qingchen Liu ,&nbsp;Yuchen Hu ,&nbsp;Shuo Li","doi":"10.1016/j.media.2026.103960","DOIUrl":"10.1016/j.media.2026.103960","url":null,"abstract":"<div><div>Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103960"},"PeriodicalIF":11.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating synthetic MRI scans for improving Alzheimer’s disease diagnosis 合成MRI扫描提高阿尔茨海默病诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1016/j.media.2026.103947
Rosanna Turrisi, Giuseppe Patané
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.
阿尔茨海默病(AD)是一种进行性神经退行性疾病,也是痴呆症的主要原因。磁共振成像(MRI)与机器学习(ML)相结合可以实现早期诊断,但ML模型在小型异构医疗数据集上训练时往往表现不佳。迁移学习(TL)有助于缓解这一限制,但在2D自然图像上预训练的模型仍然不如直接在相关3D MRI数据上训练的模型。为了解决这一差距,我们引入了一种基于合成数据生成的中间策略。具体来说,我们提出了一个条件去噪扩散概率模型(DDPM)来合成三个临床组的脑MRI扫描的二维投影(轴向、冠状、矢状):认知正常(CN)、轻度认知障碍(MCI)和AD。总共有9000张合成图像用于预训练2D模型,随后通过轴向、冠状和矢状卷积扩展到3D,并在现实世界的小数据集上进行微调。在3T ADNI数据集上,我们的方法在二元分类(CN vs. AD)和三类分类(CN/ mci /AD)上的准确率分别达到91.3%和74.5%,优于从头训练的模型和在ImageNet上预训练的模型。我们的2D ADnet在OASIS-2上取得了最先进的性能(59.3%的准确率,57.6%的F1),超过了所有竞争对手的模型,并证实了合成数据预训练的鲁棒性。这些结果表明,基于合成扩散的预训练是连接自然图像TL和医学MRI数据的一个很有前途的桥梁。
{"title":"Generating synthetic MRI scans for improving Alzheimer’s disease diagnosis","authors":"Rosanna Turrisi,&nbsp;Giuseppe Patané","doi":"10.1016/j.media.2026.103947","DOIUrl":"10.1016/j.media.2026.103947","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103947"},"PeriodicalIF":11.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Channel-Wise Joint Disentanglement Representation Learning for B-Mode and Super-Resolution Ultrasound Based CAD of Breast Cancer 基于b模式和超分辨率超声的乳腺癌CAD的通道联合解缠表示学习
IF 10.9 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.media.2026.103957
Yuhang Zheng, Jiale Xu, Qing Hua, Xiaohong Jia, Xueqin Hou, Yanfeng Yao, Zheng Wei, Yulu Zhang, Fanggang Wu, Wei Guo, Yuan Tian, Jun Wang, Shujun Xia, Yijie Dong, Jun Shi, Jianqiao Zhou
{"title":"Channel-Wise Joint Disentanglement Representation Learning for B-Mode and Super-Resolution Ultrasound Based CAD of Breast Cancer","authors":"Yuhang Zheng, Jiale Xu, Qing Hua, Xiaohong Jia, Xueqin Hou, Yanfeng Yao, Zheng Wei, Yulu Zhang, Fanggang Wu, Wei Guo, Yuan Tian, Jun Wang, Shujun Xia, Yijie Dong, Jun Shi, Jianqiao Zhou","doi":"10.1016/j.media.2026.103957","DOIUrl":"https://doi.org/10.1016/j.media.2026.103957","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"193 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anatomy-guided prompting with cross-modal self-alignment for whole-body PET-CT breast cancer segmentation 解剖引导提示与跨模态自对准用于全身PET-CT乳腺癌分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.media.2026.103956
Jiaju Huang , Xiao Yang , Xinglong Liang , Shaobin Chen , Yue Sun , Greta Sp Mok , Shuo Li , Ying Wang , Tao Tan
Accurate segmentation of breast cancer in PET-CT images is crucial for precise staging, monitoring treatment response, and guiding personalized therapy. However, the small size and dispersed nature of metastatic lesions, coupled with the scarcity of annotated data and heterogeneity between modalities that hinders effective information fusion, make this task challenging. This paper proposes a novel anatomy-guided cross-modal learning framework to address these issues. Our approach first generates organ pseudo-labels through a teacher-student learning paradigm, which serve as anatomical prompts to guide cancer segmentation. We then introduce a self-aligning cross-modal pre-training method that aligns PET and CT features in a shared latent space through masked 3D patch reconstruction, enabling effective cross-modal feature fusion. Finally, we initialize the segmentation network’s encoder with the pre-trained encoder weights, and incorporate organ labels through a Mamba-based prompt encoder and Hypernet-Controlled Cross-Attention mechanism for dynamic anatomical feature extraction and fusion. Notably, our method outperforms eight state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches, on two datasets encompassing primary breast cancer, metastatic breast cancer, and other types of cancer segmentation tasks.
在PET-CT图像中准确分割乳腺癌对于精确分期、监测治疗反应和指导个性化治疗至关重要。然而,转移病灶的小尺寸和分散性质,加上注释数据的缺乏和模式之间的异质性,阻碍了有效的信息融合,使得这项任务具有挑战性。本文提出了一种新的解剖学指导的跨模态学习框架来解决这些问题。我们的方法首先通过师生学习范式生成器官伪标签,作为指导癌症分割的解剖提示。然后,我们引入了一种自对齐的跨模态预训练方法,该方法通过掩膜3D补丁重建在共享潜在空间中对齐PET和CT特征,从而实现有效的跨模态特征融合。最后,我们使用预训练的编码器权重初始化分割网络的编码器,并通过基于mamba的提示编码器和hypernet控制的交叉注意机制合并器官标签,进行动态解剖特征提取和融合。值得注意的是,我们的方法在包括原发性乳腺癌、转移性乳腺癌和其他类型的癌症分割任务的两个数据集上优于八种最先进的方法,包括基于cnn、基于transformer和基于mamba的方法。
{"title":"Anatomy-guided prompting with cross-modal self-alignment for whole-body PET-CT breast cancer segmentation","authors":"Jiaju Huang ,&nbsp;Xiao Yang ,&nbsp;Xinglong Liang ,&nbsp;Shaobin Chen ,&nbsp;Yue Sun ,&nbsp;Greta Sp Mok ,&nbsp;Shuo Li ,&nbsp;Ying Wang ,&nbsp;Tao Tan","doi":"10.1016/j.media.2026.103956","DOIUrl":"10.1016/j.media.2026.103956","url":null,"abstract":"<div><div>Accurate segmentation of breast cancer in PET-CT images is crucial for precise staging, monitoring treatment response, and guiding personalized therapy. However, the small size and dispersed nature of metastatic lesions, coupled with the scarcity of annotated data and heterogeneity between modalities that hinders effective information fusion, make this task challenging. This paper proposes a novel anatomy-guided cross-modal learning framework to address these issues. Our approach first generates organ pseudo-labels through a teacher-student learning paradigm, which serve as anatomical prompts to guide cancer segmentation. We then introduce a self-aligning cross-modal pre-training method that aligns PET and CT features in a shared latent space through masked 3D patch reconstruction, enabling effective cross-modal feature fusion. Finally, we initialize the segmentation network’s encoder with the pre-trained encoder weights, and incorporate organ labels through a Mamba-based prompt encoder and Hypernet-Controlled Cross-Attention mechanism for dynamic anatomical feature extraction and fusion. Notably, our method outperforms eight state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches, on two datasets encompassing primary breast cancer, metastatic breast cancer, and other types of cancer segmentation tasks.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103956"},"PeriodicalIF":11.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1