首页 > 最新文献

Medical image analysis最新文献

英文 中文
Unlocking 2D/3D+T myocardial mechanics from cine MRI: a mechanically regularized space-time finite element correlation framework 从电影MRI解锁2D/3D+T心肌力学:机械正则化时空有限元关联框架
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-10 DOI: 10.1016/j.media.2026.103944
Haizhou Liu , Xueling Qin , Zhou Liu , Yuxi Jin , Heng Jiang , Yunlong Gao , Jidong Han , Yijia Zheng , Heng Sun , Lingtao Mao , François Hild , Hairong Zheng , Dong Liang , Na Zhang , Jiuping Liang , Dehong Luo , Zhanli Hu
Accurate and biomechanically consistent quantification of cardiac motion remains a major challenge in cine MRI analysis. While classical feature-tracking and recent deep learning methods have improved frame-wise strain estimation, they often lack biomechanical interpretability and temporal coherence. In this study, we propose a spacetime-regularized finite-element digital image/volume correlation (FE-DIC/DVC) framework that enables 2D/3D+T myocardial motion tracking and strain analysis using only routine cine MRI. The method unifies Multiview alignment and 2D/3D+T motion estimation into a coherent pipeline, combining region-specific biomechanical regularization with data-driven based temporal decomposition to promote spatial fidelity and temporal consistency. A correlation-based Multiview alignment module further enhances anatomical consistency across short- and long-axis views. We evaluate the approach on one synthetic dataset (with ground-truth motion and strain fields), three public datasets (with ground-truth landmarks or myocardial masks), and a clinical dataset (with ground-truth myocardial masks). 2D+T motion and strain are evaluated across all datasets, whereas Multiview alignment and 3D+T motion estimation is assessed only on the clinical dataset. Compared with two classical feature-tracking methods and four state-of-the-art deep-learning baselines, the proposed method improves 2D+T motion and strain estimation accuracy as well as temporal consistency on the synthetic data, achieving a displacement RMSE of 0.35 pixels (vs. 0.73 pixels), an equivalent-strain RMSE of 0.05 (vs. 0.097), and a temporal consistency of 0.97 (vs. 0.91). On public and clinical data, it achieves superior performance in terms of a landmark error of 1.96 mm (vs. 3.15 mm), a boundary-tracking Dice of 0.80–0.87 (a 2–4% improvement over the best-performing baseline), and overall registration quality that consistently ranks among the top two methods. By leveraging only standard cine MRI, this work enables 2D/3D+T myocardial mechanics and provides a practical route toward 4D cardiac function assessment.
准确和生物力学一致的心脏运动量化仍然是电影MRI分析的主要挑战。虽然经典的特征跟踪和最近的深度学习方法已经改进了帧应变估计,但它们往往缺乏生物力学的可解释性和时间一致性。在这项研究中,我们提出了一个时空正则化的有限元数字图像/体积相关(FE-DIC/DVC)框架,该框架仅使用常规的电影MRI即可实现2D/3D+T心肌运动跟踪和应变分析。该方法将multiview对齐和2D/3D+T运动估计统一到一个连贯的管道中,将区域生物力学正则化与基于数据驱动的时间分解相结合,提高了空间保真度和时间一致性。基于相关性的多轴对齐模块进一步增强了跨短轴和长轴视图的解剖一致性。我们在一个合成数据集(含地真值运动和应变场)、三个公共数据集(含地真值地标或心肌掩模)和一个临床数据集(含地真值心肌掩模)上评估了该方法。2D+T运动和应变在所有数据集中进行评估,而multiview对齐和3D+T运动估计仅在临床数据集中进行评估。与两种经典特征跟踪方法和四条最先进的深度学习基线相比,该方法提高了2D+T运动和应变估计精度以及合成数据的时间一致性,位移RMSE为0.35像素(vs. 0.73像素),等效应变RMSE为0.05 (vs. 0.097),时间一致性为0.97 (vs. 0.91)。在公共和临床数据方面,它在1.96 mm的里程碑误差(对3.15 mm), 0.80-0.87的边界跟踪Dice(比表现最好的基线提高2-4%)和总体注册质量方面取得了卓越的性能,始终名列前两种方法之列。通过仅利用标准的电影MRI,这项工作实现了2D/3D+T心肌力学,并为4D心功能评估提供了实用的途径。
{"title":"Unlocking 2D/3D+T myocardial mechanics from cine MRI: a mechanically regularized space-time finite element correlation framework","authors":"Haizhou Liu ,&nbsp;Xueling Qin ,&nbsp;Zhou Liu ,&nbsp;Yuxi Jin ,&nbsp;Heng Jiang ,&nbsp;Yunlong Gao ,&nbsp;Jidong Han ,&nbsp;Yijia Zheng ,&nbsp;Heng Sun ,&nbsp;Lingtao Mao ,&nbsp;François Hild ,&nbsp;Hairong Zheng ,&nbsp;Dong Liang ,&nbsp;Na Zhang ,&nbsp;Jiuping Liang ,&nbsp;Dehong Luo ,&nbsp;Zhanli Hu","doi":"10.1016/j.media.2026.103944","DOIUrl":"10.1016/j.media.2026.103944","url":null,"abstract":"<div><div>Accurate and biomechanically consistent quantification of cardiac motion remains a major challenge in cine MRI analysis. While classical feature-tracking and recent deep learning methods have improved frame-wise strain estimation, they often lack biomechanical interpretability and temporal coherence. In this study, we propose a spacetime-regularized finite-element digital image/volume correlation (FE-DIC/DVC) framework that enables 2D/3D+T myocardial motion tracking and strain analysis using only routine cine MRI. The method unifies Multiview alignment and 2D/3D+T motion estimation into a coherent pipeline, combining region-specific biomechanical regularization with data-driven based temporal decomposition to promote spatial fidelity and temporal consistency. A correlation-based Multiview alignment module further enhances anatomical consistency across short- and long-axis views. We evaluate the approach on one synthetic dataset (with ground-truth motion and strain fields), three public datasets (with ground-truth landmarks or myocardial masks), and a clinical dataset (with ground-truth myocardial masks). 2D+T motion and strain are evaluated across all datasets, whereas Multiview alignment and 3D+T motion estimation is assessed only on the clinical dataset. Compared with two classical feature-tracking methods and four state-of-the-art deep-learning baselines, the proposed method improves 2D+T motion and strain estimation accuracy as well as temporal consistency on the synthetic data, achieving a displacement RMSE of 0.35 pixels (vs. 0.73 pixels), an equivalent-strain RMSE of 0.05 (vs. 0.097), and a temporal consistency of 0.97 (vs. 0.91). On public and clinical data, it achieves superior performance in terms of a landmark error of 1.96 mm (vs. 3.15 mm), a boundary-tracking Dice of 0.80–0.87 (a 2–4% improvement over the best-performing baseline), and overall registration quality that consistently ranks among the top two methods. By leveraging only standard cine MRI, this work enables 2D/3D+T myocardial mechanics and provides a practical route toward 4D cardiac function assessment.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103944"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-guided multi-geometric window transformer for cardiac cine MRI reconstruction 知识引导的心脏MRI重建多几何窗口变压器
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-09 DOI: 10.1016/j.media.2026.103936
Jun Lyu , Guangming Wang , Yunqi Wang , Jing Qin , Chengyan Wang
Magnetic resonance imaging (MRI) plays a crucial role in clinical diagnosis, yet traditional MR image acquisition often requires a prolonged duration, potentially causing patient discomfort and image artifacts. Faster and more accurate image reconstruction may alleviate patient discomfort during MRI examinations and enhance diagnostic accuracy and efficiency. In recent years, significant advancements in deep learning technology offer promise for improving MR image quality and accelerating acquisition. Addressing the demand for cardiac cine MRI reconstruction, we propose KGMgT, a novel MRI reconstruction network based on knowledge-guided approaches. The KGMgT model leverages adaptive spatiotemporal attention mechanisms to infer motion trajectories of adjacent cardiac frames, thereby better extracting complementary information. Additionally, we employ Transformer-driven dynamic feature aggregation to establish long-range dependencies, facilitating global information integration. Research findings demonstrate that the KGMgT model achieves state-of-the-art performance on multiple benchmark datasets, offering an efficient solution for cardiac cine MRI reconstruction. This collaborative approach, combining artificial intelligence technology to assist medical professionals in clinical decision-making, holds promise for significantly improving diagnostic efficiency, optimizing treatment plans, and enhancing the patient treatment experience. The code and trained models are available at https://github.com/MICV-Lab/KGMgT.
磁共振成像(MRI)在临床诊断中起着至关重要的作用,然而传统的磁共振图像采集通常需要较长的时间,可能会导致患者不适和图像伪影。更快更准确的图像重建可以减轻患者在MRI检查时的不适,提高诊断的准确性和效率。近年来,深度学习技术的重大进步为提高MR图像质量和加速采集提供了希望。针对心脏电影MRI重构的需求,我们提出了一种基于知识引导方法的新型MRI重构网络KGMgT。KGMgT模型利用自适应时空注意机制来推断相邻心脏帧的运动轨迹,从而更好地提取互补信息。此外,我们使用transformer驱动的动态特征聚合来建立远程依赖,促进全局信息集成。研究结果表明,KGMgT模型在多个基准数据集上达到了最先进的性能,为心脏影像MRI重建提供了有效的解决方案。这种协作方法结合人工智能技术来协助医疗专业人员进行临床决策,有望显著提高诊断效率,优化治疗方案,并增强患者的治疗体验。代码和经过训练的模型可在https://github.com/MICV-Lab/KGMgT上获得。
{"title":"Knowledge-guided multi-geometric window transformer for cardiac cine MRI reconstruction","authors":"Jun Lyu ,&nbsp;Guangming Wang ,&nbsp;Yunqi Wang ,&nbsp;Jing Qin ,&nbsp;Chengyan Wang","doi":"10.1016/j.media.2026.103936","DOIUrl":"10.1016/j.media.2026.103936","url":null,"abstract":"<div><div>Magnetic resonance imaging (MRI) plays a crucial role in clinical diagnosis, yet traditional MR image acquisition often requires a prolonged duration, potentially causing patient discomfort and image artifacts. Faster and more accurate image reconstruction may alleviate patient discomfort during MRI examinations and enhance diagnostic accuracy and efficiency. In recent years, significant advancements in deep learning technology offer promise for improving MR image quality and accelerating acquisition. Addressing the demand for cardiac cine MRI reconstruction, we propose KGMgT, a novel MRI reconstruction network based on knowledge-guided approaches. The KGMgT model leverages adaptive spatiotemporal attention mechanisms to infer motion trajectories of adjacent cardiac frames, thereby better extracting complementary information. Additionally, we employ Transformer-driven dynamic feature aggregation to establish long-range dependencies, facilitating global information integration. Research findings demonstrate that the KGMgT model achieves state-of-the-art performance on multiple benchmark datasets, offering an efficient solution for cardiac cine MRI reconstruction. This collaborative approach, combining artificial intelligence technology to assist medical professionals in clinical decision-making, holds promise for significantly improving diagnostic efficiency, optimizing treatment plans, and enhancing the patient treatment experience. The code and trained models are available at <span><span>https://github.com/MICV-Lab/KGMgT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103936"},"PeriodicalIF":11.8,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeuroDetour: A neural pathway transformer for uncovering structural-functional coupling mechanisms in human connectome NeuroDetour:揭示人类连接组结构-功能耦合机制的神经通路转换器
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-08 DOI: 10.1016/j.media.2025.103931
Ziquan Wei , Tingting Dan , Jiaqi Ding , Paul J. Laurienti , Guorong Wu
Although modern imaging methods enable in-vivo examination of connections between distinct brain areas, we still lack a comprehensive understanding of how anatomical structure underpins brain function and how spontaneous fluctuations in neural activity give rise to cognition. At the same time, many efforts in machine learning have focused on modeling the complex, nonlinear relationships between neuroimaging signals and observable traits. Yet, current machine learning techniques often overlook fundamental neuroscience insights, making it difficult to interpret transient neural dynamics in terms of cognitive behavior. To bridge this gap, we turn our attention to the interplay between structural connectivity (SC) and functional connectivity (FC), reframing this open question in network neuroscience as a graph representation learning task centered on neural pathways. In particular, we introduce the notion of a “topological detour” to describe how a given instance of FC (i.e., a direct functional connection) is physically supported by underlying SC pathways (the detour), forming a feedback loop between brain structure and function. By considering these multi-hop detour routes that mediate SC-FC coupling, we design a novel multi-head self-attention mechanism within a Transformer architecture. Building on these ideas, we present a biologically inspired deep-learning framework, NeuroDetour, that extracts connectomic feature representations from large-scale neuroimaging datasets and can be applied to downstream tasks such as task classification and disease prediction. We validated NeuroDetour on extensive public cohorts, including the Human Connectome Project (HCP) and UK Biobank (UKB), using both supervised learning and zero-shot settings. In all scenarios, NeuroDetour achieves state-of-the-art results.
尽管现代成像方法能够在体内检查不同大脑区域之间的联系,但我们仍然缺乏对解剖结构如何支撑大脑功能以及神经活动的自发波动如何产生认知的全面理解。与此同时,机器学习的许多努力都集中在神经成像信号和可观察特征之间复杂的非线性关系的建模上。然而,目前的机器学习技术往往忽略了基本的神经科学见解,因此很难从认知行为的角度来解释短暂的神经动力学。为了弥补这一差距,我们将注意力转向结构连接(SC)和功能连接(FC)之间的相互作用,将网络神经科学中的这个开放问题重新定义为以神经通路为中心的图表示学习任务。特别是,我们引入了“拓扑迂回”的概念来描述一个给定的FC实例(即,直接的功能连接)是如何被潜在的SC通路(迂回)物理支持的,形成了大脑结构和功能之间的反馈回路。通过考虑这些调解SC-FC耦合的多跳绕道路由,我们在Transformer体系结构中设计了一种新颖的多头自关注机制。基于这些想法,我们提出了一个受生物学启发的深度学习框架NeuroDetour,它从大规模神经成像数据集中提取连接组特征表示,并可应用于下游任务,如任务分类和疾病预测。我们在广泛的公共队列中验证了NeuroDetour,包括人类连接组项目(HCP)和英国生物银行(UKB),使用监督学习和零射击设置。在所有情况下,NeuroDetour都能获得最先进的结果。
{"title":"NeuroDetour: A neural pathway transformer for uncovering structural-functional coupling mechanisms in human connectome","authors":"Ziquan Wei ,&nbsp;Tingting Dan ,&nbsp;Jiaqi Ding ,&nbsp;Paul J. Laurienti ,&nbsp;Guorong Wu","doi":"10.1016/j.media.2025.103931","DOIUrl":"10.1016/j.media.2025.103931","url":null,"abstract":"<div><div>Although modern imaging methods enable in-vivo examination of connections between distinct brain areas, we still lack a comprehensive understanding of how anatomical structure underpins brain function and how spontaneous fluctuations in neural activity give rise to cognition. At the same time, many efforts in machine learning have focused on modeling the complex, nonlinear relationships between neuroimaging signals and observable traits. Yet, current machine learning techniques often overlook fundamental neuroscience insights, making it difficult to interpret transient neural dynamics in terms of cognitive behavior. To bridge this gap, we turn our attention to the interplay between structural connectivity (SC) and functional connectivity (FC), reframing this open question in network neuroscience as a graph representation learning task centered on neural pathways. In particular, we introduce the notion of a “topological detour” to describe how a given instance of FC (i.e., a direct functional connection) is physically supported by underlying SC pathways (the detour), forming a feedback loop between brain structure and function. By considering these multi-hop detour routes that mediate SC-FC coupling, we design a novel multi-head self-attention mechanism within a Transformer architecture. Building on these ideas, we present a biologically inspired deep-learning framework, <em>NeuroDetour</em>, that extracts connectomic feature representations from large-scale neuroimaging datasets and can be applied to downstream tasks such as task classification and disease prediction. We validated <em>NeuroDetour</em> on extensive public cohorts, including the Human Connectome Project (HCP) and UK Biobank (UKB), using both supervised learning and zero-shot settings. In all scenarios, <em>NeuroDetour</em> achieves state-of-the-art results.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103931"},"PeriodicalIF":11.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid spatio-temporal MR fingerprinting using physics-informed implicit neural representation 使用物理信息内隐神经表征的快速时空磁共振指纹识别
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-08 DOI: 10.1016/j.media.2026.103935
Chaoguang Gong , Lixian Zou , Peng Li , Xingyang Wu , Yangzi Qiao , Zhanqi Hu , Xiaoyan Wang , Yihang Zhou , Kai Wang , Yue Hu , Haifeng Wang
The potential of Magnetic Resonance Fingerprinting (MRF), which allows for rapid and simultaneous multi-parametric quantitative MRI, is often limited by severe aliasing artifacts caused by aggressive undersampling. Conventional MRF approaches typically treat these artifacts as detrimental noise and focus on their removal, often at the cost of either reduced reconstruction speed or increased reliance on large training datasets. Building on the insight that structured aliasing can be leveraged as an informative spatial encoding mechanism, we propose to extend MRF’s encoding capacity to the global spatio-temporal domain by introducing a novel Physics-informed implicit neural MRF (πMRF) framework. πMRF integrates physics-informed spatio-temporal fingerprint modeling with implicit neural representations (INRs), enabling unsupervised, gradient-driven joint estimation of quantitative tissue parameters and coil sensitivity maps (CSMs) with enhanced accuracy and robustness. Specifically, πMRF leverages a scalable component based on physics-informed neural networks (PINNs) to facilitate accurate high-dimensional signal modeling and memory-efficient optimization. In addition, a subspace-guided sensitivity regularization is developed to improve the robustness of CSM estimation in highly undersampled scenarios. Experimental results on simulated, phantom, and in vivo datasets demonstrate that πMRF achieves improved quantitative accuracy and robustness even under highly accelerated acquisitions, outperforming state-of-the-art MRF methods.
磁共振指纹识别(MRF)的潜力,允许快速和同时多参数定量MRI,经常受到严重欠采样引起的严重混叠伪影的限制。传统的MRF方法通常将这些伪影视为有害的噪声,并专注于去除它们,通常以降低重建速度或增加对大型训练数据集的依赖为代价。基于结构化混叠可以作为一种信息空间编码机制的见解,我们提出通过引入一种新的物理信息隐式神经MRF (πMRF)框架,将MRF的编码能力扩展到全局时空域。π - mrf将物理信息的时空指纹建模与隐式神经表征(INRs)相结合,实现了无监督、梯度驱动的定量组织参数和线圈灵敏度图(csm)的联合估计,提高了准确性和鲁棒性。具体来说,πMRF利用基于物理信息神经网络(pinn)的可扩展组件来促进精确的高维信号建模和内存效率优化。此外,提出了一种子空间导向的灵敏度正则化方法,以提高CSM估计在高度欠采样情况下的鲁棒性。在模拟、模拟和体内数据集上的实验结果表明,πMRF即使在高度加速的采集下也能实现更高的定量准确性和鲁棒性,优于最先进的MRF方法。
{"title":"Rapid spatio-temporal MR fingerprinting using physics-informed implicit neural representation","authors":"Chaoguang Gong ,&nbsp;Lixian Zou ,&nbsp;Peng Li ,&nbsp;Xingyang Wu ,&nbsp;Yangzi Qiao ,&nbsp;Zhanqi Hu ,&nbsp;Xiaoyan Wang ,&nbsp;Yihang Zhou ,&nbsp;Kai Wang ,&nbsp;Yue Hu ,&nbsp;Haifeng Wang","doi":"10.1016/j.media.2026.103935","DOIUrl":"10.1016/j.media.2026.103935","url":null,"abstract":"<div><div>The potential of Magnetic Resonance Fingerprinting (MRF), which allows for rapid and simultaneous multi-parametric quantitative MRI, is often limited by severe aliasing artifacts caused by aggressive undersampling. Conventional MRF approaches typically treat these artifacts as detrimental noise and focus on their removal, often at the cost of either reduced reconstruction speed or increased reliance on large training datasets. Building on the insight that structured aliasing can be leveraged as an informative spatial encoding mechanism, we propose to extend MRF’s encoding capacity to the global spatio-temporal domain by introducing a novel Physics-informed implicit neural MRF (<em>π</em>MRF) framework. <em>π</em>MRF integrates physics-informed spatio-temporal fingerprint modeling with implicit neural representations (INRs), enabling unsupervised, gradient-driven joint estimation of quantitative tissue parameters and coil sensitivity maps (CSMs) with enhanced accuracy and robustness. Specifically, <em>π</em>MRF leverages a scalable component based on physics-informed neural networks (PINNs) to facilitate accurate high-dimensional signal modeling and memory-efficient optimization. In addition, a subspace-guided sensitivity regularization is developed to improve the robustness of CSM estimation in highly undersampled scenarios. Experimental results on simulated, phantom, and <em>in vivo</em> datasets demonstrate that <em>π</em>MRF achieves improved quantitative accuracy and robustness even under highly accelerated acquisitions, outperforming state-of-the-art MRF methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103935"},"PeriodicalIF":11.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent diffusion autoencoders: Toward efficient and meaningful unsupervised representation learning in medical imaging - a case study on Alzheimer’s disease 潜在扩散自编码器:在医学成像中实现高效和有意义的无监督表示学习——以阿尔茨海默病为例
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.media.2026.103932
Gabriele Lozupone , Alessandro Bria , Francesco Fontanella , Frederick J.A. Meijer , Claudio De Stefano , Henkjan Huisman
This study presents Latent Diffusion Autoencoder (LDAE), a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging, focusing on Alzheimer’s disease (AD) using brain MRI from the ADNI database as a case study. Unlike conventional diffusion autoencoders operating in image space, LDAE applies the diffusion process in a compressed latent representation, improving computational efficiency and making 3D medical imaging representation learning tractable. To validate the proposed approach, we explore two key hypotheses: (i) LDAE effectively captures meaningful semantic representations on 3D brain MRI associated with AD and ageing, and (ii) LDAE achieves high-quality image generation and reconstruction while being computationally efficient. Experimental results support both hypotheses: (i) linear-probe evaluations demonstrate promising diagnostic performance for AD (AUROC: 90%, ACC: 84%) and age prediction (MAE: 4.1 years, RMSE: 5.2 years); (ii) the learned semantic representations enable attribute manipulation, yielding anatomically plausible modifications; (iii) semantic interpolation experiments show strong reconstruction of missing scans, with SSIM of 0.969 (MSE: 0.0019) for a 6-month gap. Even for longer gaps (24 months), the model maintains robust performance (SSIM  >  0.93, MSE  <  0.004), indicating an ability to capture temporal progression trends; (iv) compared to conventional diffusion autoencoders, LDAE significantly increases inference throughput (20 ×  faster) while also enhancing reconstruction quality. These findings position LDAE as a promising framework for scalable medical imaging applications, with the potential to serve as a foundation model for medical image analysis. Code is publicly available at https://github.com/GabrieleLozupone/LDAE.
本研究提出了潜在扩散自编码器(LDAE),这是一种新型的基于编码器-解码器扩散的框架,用于医学成像中有效和有意义的无监督学习,重点研究阿尔茨海默病(AD),使用来自ADNI数据库的脑MRI作为案例研究。与传统的在图像空间中操作的扩散自编码器不同,LDAE将扩散过程应用于压缩的潜在表示,提高了计算效率,并使3D医学成像表示学习易于处理。为了验证所提出的方法,我们探讨了两个关键假设:(i) LDAE有效地捕获与AD和衰老相关的3D脑MRI上有意义的语义表示,以及(ii) LDAE在计算效率高的同时实现高质量的图像生成和重建。实验结果支持两种假设:(i)线性探针评估显示出有希望的AD诊断性能(AUROC: 90%, ACC: 84%)和年龄预测(MAE: 4.1岁,RMSE: 5.2岁);(ii)学习到的语义表示能够操纵属性,产生解剖学上合理的修改;(iii)语义插值实验显示缺失扫描的重建效果较好,间隔6个月的SSIM为0.969 (MSE为0.0019)。即使对于较长的间隔(24个月),该模型仍保持稳健的性能(SSIM > 0.93, MSE < 0.004),表明能够捕捉时间进展趋势;(iv)与传统的扩散自编码器相比,LDAE显著提高了推理吞吐量(快20 倍),同时也提高了重建质量。这些发现将LDAE定位为可扩展的医学成像应用程序的有前途的框架,具有作为医学图像分析基础模型的潜力。代码可在https://github.com/GabrieleLozupone/LDAE上公开获取。
{"title":"Latent diffusion autoencoders: Toward efficient and meaningful unsupervised representation learning in medical imaging - a case study on Alzheimer’s disease","authors":"Gabriele Lozupone ,&nbsp;Alessandro Bria ,&nbsp;Francesco Fontanella ,&nbsp;Frederick J.A. Meijer ,&nbsp;Claudio De Stefano ,&nbsp;Henkjan Huisman","doi":"10.1016/j.media.2026.103932","DOIUrl":"10.1016/j.media.2026.103932","url":null,"abstract":"<div><div>This study presents Latent Diffusion Autoencoder (LDAE), a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging, focusing on Alzheimer’s disease (AD) using brain MRI from the ADNI database as a case study. Unlike conventional diffusion autoencoders operating in image space, LDAE applies the diffusion process in a compressed latent representation, improving computational efficiency and making 3D medical imaging representation learning tractable. To validate the proposed approach, we explore two key hypotheses: (i) LDAE effectively captures meaningful semantic representations on 3D brain MRI associated with AD and ageing, and (ii) LDAE achieves high-quality image generation and reconstruction while being computationally efficient. Experimental results support both hypotheses: (i) linear-probe evaluations demonstrate promising diagnostic performance for AD (AUROC: 90%, ACC: 84%) and age prediction (MAE: 4.1 years, RMSE: 5.2 years); (ii) the learned semantic representations enable attribute manipulation, yielding anatomically plausible modifications; (iii) semantic interpolation experiments show strong reconstruction of missing scans, with SSIM of 0.969 (MSE: 0.0019) for a 6-month gap. Even for longer gaps (24 months), the model maintains robust performance (SSIM  &gt;  0.93, MSE  &lt;  0.004), indicating an ability to capture temporal progression trends; (iv) compared to conventional diffusion autoencoders, LDAE significantly increases inference throughput (20 ×  faster) while also enhancing reconstruction quality. These findings position LDAE as a promising framework for scalable medical imaging applications, with the potential to serve as a foundation model for medical image analysis. Code is publicly available at <span><span>https://github.com/GabrieleLozupone/LDAE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103932"},"PeriodicalIF":11.8,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UDV-Net: A hybrid CNN and transformer vein segmentation network with vascular prior and spatial awareness UDV-Net:具有血管先验和空间感知的混合CNN和变压器静脉分割网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.media.2025.103929
Bowei Shen , Xiaoquan Huang , Yuli Li , Xinghuan Li , Lili Ma , Yonghong Shi , Shiyao Chen
CNNs handling multi-scale variations and Transformers modeling long-range dependencies are crucial for vascular segmentation. The fusion of these two models effectively combines the multi-scale local features extracted by CNNs with the global information modeled by Transformers, significantly enhancing the accuracy of blood vessel segmentation. However, the powerful model faces challenges when dealing with the gradual formation of extensive collateral vessels in the upper digestive system veins of patients with cirrhotic portal hypertension, leading to numerous false negative and false positive segmentation results. To this end, the paper proposes UDV-Net, a fusion network combining CNN and Transformer with vessel prior and spatial awareness for upper digestive system vein vessel segmentation. Initially, a CNN utilizing an encoding-decoding architecture is employed to create a multi-scale representation of blood vessels from the image. The representation is further refined by the blood vessel attention module at the corresponding scale to address tubular structures, thereby reducing false positive results. Secondly, a Transformer bridge with three-dimensional voxel position encoding is proposed to connect the corresponding encoding-decoding layer, effectively perceiving widely distributed blood vessels with diverse shapes, improving blood vessel connectivity, and avoiding false negative blood vessel results. We collected and annotated abdominal contrast-enhanced CT images of 191 patients with liver cirrhosis, constituting the PHCT dataset. Our method’s validation result on this dataset is state-of-the-art. When evaluated on the publicly available 3D-IRCADb dataset as an unseen external validation set for PHCT, the model demonstrated satisfactory performance. Additionally, our method also achieves the optimal performance on the public MSD hepatic vessel dataset.
cnn处理多尺度变化和变压器建模远程依赖关系对血管分割至关重要。两种模型的融合有效地将cnn提取的多尺度局部特征与Transformers建模的全局信息相结合,显著提高了血管分割的精度。然而,强大的模型在处理肝硬化门脉高压患者上消化系统静脉逐渐形成的广泛侧支血管时面临挑战,导致大量假阴性和假阳性分割结果。为此,本文提出了一种结合CNN和Transformer的融合网络UDV-Net,结合血管先验和空间感知,用于上消化系统静脉血管分割。首先,利用编解码架构的CNN从图像中创建血管的多尺度表示。血管关注模块在相应的尺度上进一步细化表征,以解决管状结构,从而减少假阳性结果。其次,采用三维体素位置编码的Transformer桥连接相应的编解码层,有效感知分布广泛、形状各异的血管,提高血管连通性,避免血管结果假阴性。我们收集并注释了191例肝硬化患者的腹部增强CT图像,构成PHCT数据集。我们的方法在这个数据集上的验证结果是最先进的。当在公开可用的3D-IRCADb数据集上作为PHCT的未知外部验证集进行评估时,该模型表现出令人满意的性能。此外,我们的方法在公共MSD肝血管数据集上也达到了最佳性能。
{"title":"UDV-Net: A hybrid CNN and transformer vein segmentation network with vascular prior and spatial awareness","authors":"Bowei Shen ,&nbsp;Xiaoquan Huang ,&nbsp;Yuli Li ,&nbsp;Xinghuan Li ,&nbsp;Lili Ma ,&nbsp;Yonghong Shi ,&nbsp;Shiyao Chen","doi":"10.1016/j.media.2025.103929","DOIUrl":"10.1016/j.media.2025.103929","url":null,"abstract":"<div><div>CNNs handling multi-scale variations and Transformers modeling long-range dependencies are crucial for vascular segmentation. The fusion of these two models effectively combines the multi-scale local features extracted by CNNs with the global information modeled by Transformers, significantly enhancing the accuracy of blood vessel segmentation. However, the powerful model faces challenges when dealing with the gradual formation of extensive collateral vessels in the upper digestive system veins of patients with cirrhotic portal hypertension, leading to numerous false negative and false positive segmentation results. To this end, the paper proposes UDV-Net, a fusion network combining CNN and Transformer with vessel prior and spatial awareness for upper digestive system vein vessel segmentation. Initially, a CNN utilizing an encoding-decoding architecture is employed to create a multi-scale representation of blood vessels from the image. The representation is further refined by the blood vessel attention module at the corresponding scale to address tubular structures, thereby reducing false positive results. Secondly, a Transformer bridge with three-dimensional voxel position encoding is proposed to connect the corresponding encoding-decoding layer, effectively perceiving widely distributed blood vessels with diverse shapes, improving blood vessel connectivity, and avoiding false negative blood vessel results. We collected and annotated abdominal contrast-enhanced CT images of 191 patients with liver cirrhosis, constituting the PHCT dataset. Our method’s validation result on this dataset is state-of-the-art. When evaluated on the publicly available 3D-IRCADb dataset as an unseen external validation set for PHCT, the model demonstrated satisfactory performance. Additionally, our method also achieves the optimal performance on the public MSD hepatic vessel dataset.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103929"},"PeriodicalIF":11.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SupReMix: Supervised contrastive learning for medical imaging regression with mixup SupReMix:混合医学影像回归的监督对比学习
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.media.2025.103909
Yilei Wu , Zijian Dong , Chongyao Chen , Wangchunshu Zhou , Juan Helen Zhou
In medical image analysis, regression plays a critical role in computer-aided diagnosis. It enables quantitative measurements such as age prediction from structural imaging, cardiac function quantification, and molecular measurement from PET scans. While deep learning has shown promise for these tasks, most approaches focus solely on optimizing regression loss or model architecture, neglecting the quality of learned feature representations which are crucial for robust clinical predictions. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for medical image regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we propose Supervised Contrastive Learning for Medical Imaging Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through theoretical analysis and extensive experiments on six datasets spanning MRI, X-ray, ultrasound, and PET modalities, we demonstrate that SupReMix fosters continuous ordered representations, significantly improving regression performance.
在医学图像分析中,回归在计算机辅助诊断中起着至关重要的作用。它可以实现定量测量,如结构成像的年龄预测,心功能量化和PET扫描的分子测量。虽然深度学习在这些任务中表现出了希望,但大多数方法只关注优化回归损失或模型架构,而忽略了学习到的特征表示的质量,而这些特征表示对于稳健的临床预测至关重要。直接将用于分类的表示学习技术应用于回归往往会导致潜在空间中的碎片化表示,从而产生次优性能。在本文中,我们认为对比学习在医学图像回归中的潜力被掩盖了,因为忽视了两个关键方面:序数意识和硬度。为了解决这些挑战,我们提出了用于混合医学成像回归的监督对比学习(SupReMix)。在嵌入水平上,它将包含锚的混合物(锚和明显负样本的混合)作为硬负对,将不包含锚的混合物(两个明显负样本的混合)作为硬正对。这种策略通过整合更丰富的有序信息来形成更难的对比对。通过对六个数据集(包括MRI、x射线、超声波和PET)的理论分析和广泛实验,我们证明SupReMix促进了连续有序表示,显著提高了回归性能。
{"title":"SupReMix: Supervised contrastive learning for medical imaging regression with mixup","authors":"Yilei Wu ,&nbsp;Zijian Dong ,&nbsp;Chongyao Chen ,&nbsp;Wangchunshu Zhou ,&nbsp;Juan Helen Zhou","doi":"10.1016/j.media.2025.103909","DOIUrl":"10.1016/j.media.2025.103909","url":null,"abstract":"<div><div>In medical image analysis, regression plays a critical role in computer-aided diagnosis. It enables quantitative measurements such as age prediction from structural imaging, cardiac function quantification, and molecular measurement from PET scans. While deep learning has shown promise for these tasks, most approaches focus solely on optimizing regression loss or model architecture, neglecting the quality of learned feature representations which are crucial for robust clinical predictions. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for medical image regression has been overshadowed due to the neglect of two crucial aspects: <em>ordinality-awareness</em> and <em>hardness</em>. To address these challenges, we propose <strong>Sup</strong>ervised Contrastive Learning for Medical Imaging <strong>Re</strong>gression with <strong>Mi</strong>xup (<strong>SupReMix</strong>). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through theoretical analysis and extensive experiments on six datasets spanning MRI, X-ray, ultrasound, and PET modalities, we demonstrate that SupReMix fosters continuous ordered representations, significantly improving regression performance.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103909"},"PeriodicalIF":11.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DVAP-Reg: Dual-view anatomical prior-driven cross-dimensional registration for spinal surgery navigation DVAP-Reg:脊柱外科导航的双视角解剖先验驱动的跨维配准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.media.2025.103930
Zhengyang Wu , Wenjie Zheng , Yingjie Hao , Jing Ling , Maodan Nie , Rui Zuo , Minghan Liu , Zegang Shi , Wen Xia , Fayuan Zhou , Zhuojun Cao , Jingjing Xiao , Weisheng Li , Guifeng Xia , Changqing Li , Yucheng Shu , Chao Zhang
2D-3D cross-dimensional registration serves as a critical technology in spinal surgery navigation, with profound implications for enhancing surgical precision, reducing radiation exposure and mitigating surgical risks. Its core objective is to achieve visual navigation by accurately aligning preoperative high-resolution 3D vertebrae with intraoperative 2D X-rays. However, this technology remains constrained by prominent challenges, primarily arising from the inherent semantic and dimensional discrepancies. Traditional registration methods, typically relying on classical iterative optimization strategies, suffer from low computational efficiency. Meanwhile, deep learning-based approaches struggle to accommodate the spatial randomness inherent in intraoperative 2D X-rays and 3D vertebrae. In this paper, we propose a dual-view anatomical prior-driven cross-dimensional registration method for spinal surgery navigation. First, a direct regression network based on dual-view X-rays, integrated with a proposed spatial correlation mechanism, is employed to enhance geometric consistency constraints and mitigate inter-patient anatomical variability. Then, corresponding vertebrae’s anatomical priors, extracted via the proposed Face-GCN module as conditional information enhancement for high-level, generalized spatial perception, are fused into the regression network for spatial pose alignment guidance. Finally, a clinical cross-dimensional image dataset is released using the developed interactive registration platform. The proposed network has been validated in real-world spinal surgical navigation scenarios across diverse lumbar spine pathologies, utilizing authentic intraoperative 2D X-rays and preoperative 3D CT images. In these clinical settings, our method achieves real-time 2D-3D image-assisted navigation with rotational and translational accuracy of 2.81° and 1.82 mm, demonstrating its ability to keep pace with the time-sensitive, ever-changing nature of intraoperative workflows. Our project’s dataset and source code are available at: https://github.com/TMMU-KLPOP/DVAP-Reg.
2D-3D交叉维度配准是脊柱手术导航中的关键技术,对提高手术精度、减少辐射暴露和降低手术风险具有深远的意义。其核心目标是通过术前高分辨率3D椎骨与术中2D x射线精确对齐来实现视觉导航。然而,该技术仍然受到突出挑战的限制,主要来自固有的语义和维度差异。传统的配准方法通常依靠经典的迭代优化策略,计算效率较低。同时,基于深度学习的方法难以适应术中2D x射线和3D椎骨固有的空间随机性。本文提出了一种双视角解剖先验驱动的脊柱外科导航交叉维配准方法。首先,采用基于双视角x射线的直接回归网络,结合提出的空间相关机制,增强几何一致性约束,减轻患者间解剖差异。然后,通过提出的Face-GCN模块提取相应的椎骨解剖先验,作为高级广义空间感知的条件信息增强,融合到回归网络中用于空间姿态对齐引导。最后,利用开发的交互式配准平台发布临床跨维图像数据集。利用真实的术中2D x射线和术前3D CT图像,该网络已经在真实的脊柱外科导航场景中得到了验证。在这些临床环境中,我们的方法实现了实时2D-3D图像辅助导航,旋转和平移精度分别为2.81°和1.82 mm,证明了它能够跟上术中工作流程的时效性和不断变化的本质。我们的项目数据集和源代码可在:https://github.com/TMMU-KLPOP/DVAP-Reg。
{"title":"DVAP-Reg: Dual-view anatomical prior-driven cross-dimensional registration for spinal surgery navigation","authors":"Zhengyang Wu ,&nbsp;Wenjie Zheng ,&nbsp;Yingjie Hao ,&nbsp;Jing Ling ,&nbsp;Maodan Nie ,&nbsp;Rui Zuo ,&nbsp;Minghan Liu ,&nbsp;Zegang Shi ,&nbsp;Wen Xia ,&nbsp;Fayuan Zhou ,&nbsp;Zhuojun Cao ,&nbsp;Jingjing Xiao ,&nbsp;Weisheng Li ,&nbsp;Guifeng Xia ,&nbsp;Changqing Li ,&nbsp;Yucheng Shu ,&nbsp;Chao Zhang","doi":"10.1016/j.media.2025.103930","DOIUrl":"10.1016/j.media.2025.103930","url":null,"abstract":"<div><div>2D-3D cross-dimensional registration serves as a critical technology in spinal surgery navigation, with profound implications for enhancing surgical precision, reducing radiation exposure and mitigating surgical risks. Its core objective is to achieve visual navigation by accurately aligning preoperative high-resolution 3D vertebrae with intraoperative 2D X-rays. However, this technology remains constrained by prominent challenges, primarily arising from the inherent semantic and dimensional discrepancies. Traditional registration methods, typically relying on classical iterative optimization strategies, suffer from low computational efficiency. Meanwhile, deep learning-based approaches struggle to accommodate the spatial randomness inherent in intraoperative 2D X-rays and 3D vertebrae. In this paper, we propose a dual-view anatomical prior-driven cross-dimensional registration method for spinal surgery navigation. First, a direct regression network based on dual-view X-rays, integrated with a proposed spatial correlation mechanism, is employed to enhance geometric consistency constraints and mitigate inter-patient anatomical variability. Then, corresponding vertebrae’s anatomical priors, extracted via the proposed Face-GCN module as conditional information enhancement for high-level, generalized spatial perception, are fused into the regression network for spatial pose alignment guidance. Finally, a clinical cross-dimensional image dataset is released using the developed interactive registration platform. The proposed network has been validated in real-world spinal surgical navigation scenarios across diverse lumbar spine pathologies, utilizing authentic intraoperative 2D X-rays and preoperative 3D CT images. In these clinical settings, our method achieves real-time 2D-3D image-assisted navigation with rotational and translational accuracy of 2.81° and 1.82 mm, demonstrating its ability to keep pace with the time-sensitive, ever-changing nature of intraoperative workflows. Our project’s dataset and source code are available at: <span><span>https://github.com/TMMU-KLPOP/DVAP-Reg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103930"},"PeriodicalIF":11.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145895637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FDA-Recon: Feature and data alignment reconstruction for sparse-view CBCT FDA-Recon:稀疏视图CBCT的特征和数据对齐重建
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-30 DOI: 10.1016/j.media.2025.103928
Yikun Zhang , Yao Wang , Xian Wu , Dianlin Hu , Tianling Lyu , Yan Xi , Xu Ji , Jian Yang , Yang Chen
Cone-beam computed tomography (CBCT) enables real-time three-dimensional imaging for patients, which is of great significance in improving the precision of radiotherapy and interventional procedures. Sparse-view CBCT, which can relax the readout rate of flat-panel detectors and reduce the radiation dose of X-rays, is a promising technology. However, sparse sampling can lead to streak artifacts in the reconstructed images, while the low-power X-ray source of the CBCT scanner can produce low SNR measurements. These degradations hinder accurate guidance for subsequent treatment procedures. To address these issues, this study proposes a learning-based reconstruction algorithm for sparse-view CBCT. To ensure the robustness of the proposed method in real-world scenarios, this study first constructs a large-scale simulated dataset whose distribution is close to the real data based on X-ray imaging physics and CT system characteristics, achieving coarse alignment at the data level. However, simple coarse alignment alone cannot completely bridge the gaps between simulated and real data. Therefore, this study further employs an unsupervised domain adaptation strategy to achieve deeper alignment in the feature space, ensuring the model trained on simulated data maintains its performance on real data. We refer to this strategy as FDA-Recon (Feature and Data Alignment Reconstruction). To achieve high performance in noise suppression and artifact removal, a deep neural network incorporating the novel Vision-LSTM mechanism is developed to fully exploit both local features and global dependencies in images. Results on real data from two different CBCT systems demonstrate the promising performance of the proposed image restoration neural network in artifact removal, noise suppression, and image restoration, as well as the potential of FDA-Recon in addressing sparse-view CBCT reconstruction in practical scenarios.
锥形束ct (Cone-beam computed tomography, CBCT)能够对患者进行实时三维成像,对提高放疗和介入手术的精度具有重要意义。稀疏视点CBCT可以降低平板探测器的读出速率,降低x射线的辐射剂量,是一种很有前途的技术。然而,稀疏采样会导致重建图像中的条纹伪影,而CBCT扫描仪的低功率x射线源会产生低信噪比的测量结果。这些降解阻碍了对后续治疗程序的准确指导。为了解决这些问题,本研究提出了一种基于学习的稀疏视图CBCT重建算法。为了保证所提方法在真实场景中的鲁棒性,本研究首先基于x射线成像物理特性和CT系统特性构建了分布接近真实数据的大规模模拟数据集,实现了数据层面的粗对齐。然而,简单的粗对准本身并不能完全弥合模拟数据和真实数据之间的差距。因此,本研究进一步采用无监督域自适应策略,在特征空间中实现更深层次的对齐,确保在模拟数据上训练的模型保持在真实数据上的性能。我们将此策略称为FDA-Recon (Feature and Data Alignment Reconstruction)。为了实现高性能的噪声抑制和伪影去除,开发了一种包含新型视觉- lstm机制的深度神经网络,以充分利用图像中的局部特征和全局依赖关系。在两种不同CBCT系统的真实数据上的结果表明,所提出的图像恢复神经网络在伪影去除、噪声抑制和图像恢复方面具有良好的性能,以及FDA-Recon在实际场景中解决稀疏视图CBCT重建的潜力。
{"title":"FDA-Recon: Feature and data alignment reconstruction for sparse-view CBCT","authors":"Yikun Zhang ,&nbsp;Yao Wang ,&nbsp;Xian Wu ,&nbsp;Dianlin Hu ,&nbsp;Tianling Lyu ,&nbsp;Yan Xi ,&nbsp;Xu Ji ,&nbsp;Jian Yang ,&nbsp;Yang Chen","doi":"10.1016/j.media.2025.103928","DOIUrl":"10.1016/j.media.2025.103928","url":null,"abstract":"<div><div>Cone-beam computed tomography (CBCT) enables real-time three-dimensional imaging for patients, which is of great significance in improving the precision of radiotherapy and interventional procedures. Sparse-view CBCT, which can relax the readout rate of flat-panel detectors and reduce the radiation dose of X-rays, is a promising technology. However, sparse sampling can lead to streak artifacts in the reconstructed images, while the low-power X-ray source of the CBCT scanner can produce low SNR measurements. These degradations hinder accurate guidance for subsequent treatment procedures. To address these issues, this study proposes a learning-based reconstruction algorithm for sparse-view CBCT. To ensure the robustness of the proposed method in real-world scenarios, this study first constructs a large-scale simulated dataset whose distribution is close to the real data based on X-ray imaging physics and CT system characteristics, achieving coarse alignment at the data level. However, simple coarse alignment alone cannot completely bridge the gaps between simulated and real data. Therefore, this study further employs an unsupervised domain adaptation strategy to achieve deeper alignment in the feature space, ensuring the model trained on simulated data maintains its performance on real data. We refer to this strategy as FDA-Recon (Feature and Data Alignment Reconstruction). To achieve high performance in noise suppression and artifact removal, a deep neural network incorporating the novel Vision-LSTM mechanism is developed to fully exploit both local features and global dependencies in images. Results on real data from two different CBCT systems demonstrate the promising performance of the proposed image restoration neural network in artifact removal, noise suppression, and image restoration, as well as the potential of FDA-Recon in addressing sparse-view CBCT reconstruction in practical scenarios.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103928"},"PeriodicalIF":11.8,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-cancer framework with cancer-aware attention and adversarial mutual-information minimization for whole slide image classification 基于肿瘤感知关注和对抗性互信息最小化的多肿瘤框架全幻灯片图像分类
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-29 DOI: 10.1016/j.media.2025.103927
Sharon Peled , Yosef E. Maruvka , Moti Freiman
Whole Slide Images (WSIs) are crucial in modern pathology, offering high-resolution data for accurate diagnosis, treatment planning, and research. Deep learning methods have recently been proposed to harness this data by extracting and interpreting complex patterns. However, these approaches often focus on specific tumor types, limiting their generalizability across diverse pathological conditions and restricting scalability. This relatively narrow focus ultimately stems from the inherent heterogeneity in histopathology and the diverse morphological and molecular characteristics of different tumors. To this end, we propose a novel approach for multi-cancer WSI analysis, designed to leverage the diversity of different tumor types. We introduce a Cancer-Aware Attention module that models both shared patterns across cancers and cancer-specific variations to address heterogeneity and enhance cross-tumor generalization. Furthermore, we construct an adversarial cancer regularization mechanism to minimize cancer-specific biases through mutual information minimization. Additionally, we develop a hierarchical sample balancing strategy to mitigate data imbalances and promote unbiased learning. Together, these form a cohesive framework for unbiased multi-cancer WSI analysis. Extensive experiments on a uniquely constructed multi-cancer dataset demonstrate significant improvements in generalization, providing a scalable solution for WSI classification across diverse cancer types.
全幻灯片图像(wsi)在现代病理学中至关重要,为准确诊断、治疗计划和研究提供高分辨率数据。最近提出了深度学习方法,通过提取和解释复杂的模式来利用这些数据。然而,这些方法往往侧重于特定的肿瘤类型,限制了它们在不同病理条件下的通用性,限制了可扩展性。这种相对狭窄的焦点最终源于组织病理学的内在异质性以及不同肿瘤的不同形态和分子特征。为此,我们提出了一种新的多癌WSI分析方法,旨在利用不同肿瘤类型的多样性。我们介绍了一个癌症意识注意模块,该模块既可以模拟癌症之间的共享模式,也可以模拟癌症特异性变异,以解决异质性并增强跨肿瘤的泛化。此外,我们构建了一个对抗性癌症正则化机制,通过互信息最小化来最小化癌症特异性偏差。此外,我们开发了一个分层样本平衡策略,以减轻数据不平衡和促进无偏学习。总之,这些构成了一个有凝聚力的框架,用于无偏倚的多癌WSI分析。在一个独特构建的多癌症数据集上进行的大量实验表明,在泛化方面有了显著的改进,为跨不同癌症类型的WSI分类提供了可扩展的解决方案。
{"title":"Multi-cancer framework with cancer-aware attention and adversarial mutual-information minimization for whole slide image classification","authors":"Sharon Peled ,&nbsp;Yosef E. Maruvka ,&nbsp;Moti Freiman","doi":"10.1016/j.media.2025.103927","DOIUrl":"10.1016/j.media.2025.103927","url":null,"abstract":"<div><div>Whole Slide Images (WSIs) are crucial in modern pathology, offering high-resolution data for accurate diagnosis, treatment planning, and research. Deep learning methods have recently been proposed to harness this data by extracting and interpreting complex patterns. However, these approaches often focus on specific tumor types, limiting their generalizability across diverse pathological conditions and restricting scalability. This relatively narrow focus ultimately stems from the inherent heterogeneity in histopathology and the diverse morphological and molecular characteristics of different tumors. To this end, we propose a novel approach for multi-cancer WSI analysis, designed to leverage the diversity of different tumor types. We introduce a Cancer-Aware Attention module that models both shared patterns across cancers and cancer-specific variations to address heterogeneity and enhance cross-tumor generalization. Furthermore, we construct an adversarial cancer regularization mechanism to minimize cancer-specific biases through mutual information minimization. Additionally, we develop a hierarchical sample balancing strategy to mitigate data imbalances and promote unbiased learning. Together, these form a cohesive framework for unbiased multi-cancer WSI analysis. Extensive experiments on a uniquely constructed multi-cancer dataset demonstrate significant improvements in generalization, providing a scalable solution for WSI classification across diverse cancer types.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103927"},"PeriodicalIF":11.8,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1