Pub Date : 2025-07-16DOI: 10.1109/TMI.2025.3589543
Shixuan Leslie Gu;Jason Ken Adhinarta;Mikhail Bessmeltsev;Jiancheng Yang;Yongjie Jessica Zhang;Wenjie Yin;Daniel Berger;Jeff W. Lichtman;Hanspeter Pfister;Donglai Wei
Accurate segmentation of anatomical substructures within 3D curvilinear structures in medical imaging remains challenging due to their complex geometry and the scarcity of diverse, large-scale datasets for algorithm development and evaluation. In this paper, we use dendritic spine segmentation as a case study and address these challenges by introducing a novel Frenet-Serret Frame-based Decomposition, which decomposes 3D curvilinear structures into a globally smooth continuous curve that captures the overall shape, and a cylindrical primitive that encodes local geometric properties. This approach leverages Frenet-Serret Frames and arc length parameterization to preserve essential geometric features while reducing representational complexity, facilitating data-efficient learning, improved segmentation accuracy, and generalization on 3D curvilinear structures. To rigorously evaluate our method, we introduce two datasets: CurviSeg, a synthetic dataset for 3D curvilinear structure segmentation that validates our method’s key properties, and DenSpineEM, a benchmark for dendritic spine segmentation, which comprises 4,476 manually annotated spines from 70 dendrites across three public electron microscopy datasets, covering multiple brain regions and species. Our experiments on DenSpineEM demonstrate exceptional cross-region and cross-species generalization: models trained on the mouse somatosensory cortex subset achieve 94.43% Dice, maintaining strong performance in zero-shot segmentation on both mouse visual cortex (95.61% Dice) and human frontal lobe (86.63% Dice) subsets. Moreover, we test the generalizability of our method on the IntrA dataset, where it achieves 77.08% Dice (5.29% higher than prior arts) on intracranial aneurysm segmentation from entire artery models. These findings demonstrate the potential of our approach for accurately analyzing complex curvilinear structures across diverse medical imaging fields. Our dataset, code, and models are available at https://github.com/VCG/FFD4DenSpineEM to support future research.
{"title":"Frenet–Serret Frame-Based Decomposition for Part Segmentation of 3-D Curvilinear Structures","authors":"Shixuan Leslie Gu;Jason Ken Adhinarta;Mikhail Bessmeltsev;Jiancheng Yang;Yongjie Jessica Zhang;Wenjie Yin;Daniel Berger;Jeff W. Lichtman;Hanspeter Pfister;Donglai Wei","doi":"10.1109/TMI.2025.3589543","DOIUrl":"10.1109/TMI.2025.3589543","url":null,"abstract":"Accurate segmentation of anatomical substructures within 3D curvilinear structures in medical imaging remains challenging due to their complex geometry and the scarcity of diverse, large-scale datasets for algorithm development and evaluation. In this paper, we use dendritic spine segmentation as a case study and address these challenges by introducing a novel Frenet-Serret Frame-based Decomposition, which decomposes 3D curvilinear structures into a globally smooth continuous curve that captures the overall shape, and a cylindrical primitive that encodes local geometric properties. This approach leverages Frenet-Serret Frames and arc length parameterization to preserve essential geometric features while reducing representational complexity, facilitating data-efficient learning, improved segmentation accuracy, and generalization on 3D curvilinear structures. To rigorously evaluate our method, we introduce two datasets: CurviSeg, a synthetic dataset for 3D curvilinear structure segmentation that validates our method’s key properties, and DenSpineEM, a benchmark for dendritic spine segmentation, which comprises 4,476 manually annotated spines from 70 dendrites across three public electron microscopy datasets, covering multiple brain regions and species. Our experiments on DenSpineEM demonstrate exceptional cross-region and cross-species generalization: models trained on the mouse somatosensory cortex subset achieve 94.43% Dice, maintaining strong performance in zero-shot segmentation on both mouse visual cortex (95.61% Dice) and human frontal lobe (86.63% Dice) subsets. Moreover, we test the generalizability of our method on the IntrA dataset, where it achieves 77.08% Dice (5.29% higher than prior arts) on intracranial aneurysm segmentation from entire artery models. These findings demonstrate the potential of our approach for accurately analyzing complex curvilinear structures across diverse medical imaging fields. Our dataset, code, and models are available at <uri>https://github.com/VCG/FFD4DenSpineEM</uri> to support future research.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5319-5331"},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144645751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-15DOI: 10.1109/TMI.2025.3589399
Boxiang Yun;Shitian Zhao;Qingli Li;Alex Kot;Yan Wang
With the assistance of large language models, which offer universal medical prior knowledge via text prompts, state-of-the-art Universal Models (UM) have demonstrated considerable potential in the field of medical image segmentation. Semantically detailed text prompts, on the one hand, indicate comprehensive knowledge; on the other hand, they bring biases that may not be applicable to specific cases involving heterogeneous organs or rare cancers. To this end, we propose a Debiased Universal Model (DUM) to consider instance-level context information and remove knowledge biases in text prompts from the causal perspective. We are the first to discover and mitigate the bias introduced by universal knowledge. Specifically, we propose to extract organ-level text prompts via language models and instance-level context prompts from the visual features of each image. We aim to highlight more on factual instance-level information and mitigate organ-level’s knowledge bias. This process can be derived and theoretically supported by a causal graph, and instantiated by designing a standard UM (SUM) and a biased UM. The debiased output is finally obtained by subtracting the likelihood distribution output by biased UM from that of the SUM. Experiments on three large-scale multi-center external datasets and MSD internal tumor datasets show that our method enhances the model’s generalization ability in handling diverse medical scenarios and reducing the potential biases, even with an improvement of 4.16% compared with popular universal model on the AbdomenAtlas dataset, showing the strong generalizability. The code is publicly available at https://github.com/DeepMed-Lab-ECNU/DUM
在通过文本提示提供通用医学先验知识的大型语言模型的帮助下,最先进的通用模型(UM)在医学图像分割领域显示出相当大的潜力。语义详实的文本提示,一方面表明知识全面;另一方面,它们带来的偏见可能不适用于涉及异质器官或罕见癌症的特定病例。为此,我们提出了一个Debiased Universal Model (DUM)来考虑实例级上下文信息,并从因果关系的角度消除文本提示中的知识偏差。我们是第一个发现并减轻普遍知识带来的偏见的人。具体来说,我们建议通过语言模型和实例级上下文提示从每个图像的视觉特征中提取器官级文本提示。我们的目标是更多地强调事实的实例级信息,减轻器官级的知识偏差。这一过程可以由因果图推导和理论上支持,并通过设计一个标准UM (SUM)和一个有偏UM来实例化。通过从SUM的似然分布输出中减去有偏UM的似然分布输出,最终得到无偏输出。在三个大规模多中心外部数据集和MSD内部肿瘤数据集上的实验表明,我们的方法增强了模型处理多种医疗场景的泛化能力,减少了潜在的偏差,甚至比目前流行的通用模型在腹大图数据集上的泛化能力提高了4.16%,显示出较强的泛化能力。该代码可在https://github.com/DeepMed-Lab-ECNU/DUM上公开获得
{"title":"Debiasing Medical Knowledge for Prompting Universal Model in CT Image Segmentation","authors":"Boxiang Yun;Shitian Zhao;Qingli Li;Alex Kot;Yan Wang","doi":"10.1109/TMI.2025.3589399","DOIUrl":"10.1109/TMI.2025.3589399","url":null,"abstract":"With the assistance of large language models, which offer universal medical prior knowledge via text prompts, state-of-the-art Universal Models (UM) have demonstrated considerable potential in the field of medical image segmentation. Semantically detailed text prompts, on the one hand, indicate comprehensive knowledge; on the other hand, they bring biases that may not be applicable to specific cases involving heterogeneous organs or rare cancers. To this end, we propose a Debiased Universal Model (DUM) to consider instance-level context information and remove knowledge biases in text prompts from the causal perspective. We are the first to discover and mitigate the bias introduced by universal knowledge. Specifically, we propose to extract organ-level text prompts via language models and instance-level context prompts from the visual features of each image. We aim to highlight more on factual instance-level information and mitigate organ-level’s knowledge bias. This process can be derived and theoretically supported by a causal graph, and instantiated by designing a standard UM (SUM) and a biased UM. The debiased output is finally obtained by subtracting the likelihood distribution output by biased UM from that of the SUM. Experiments on three large-scale multi-center external datasets and MSD internal tumor datasets show that our method enhances the model’s generalization ability in handling diverse medical scenarios and reducing the potential biases, even with an improvement of 4.16% compared with popular universal model on the AbdomenAtlas dataset, showing the strong generalizability. The code is publicly available at <uri>https://github.com/DeepMed-Lab-ECNU/DUM</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5142-5154"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-15DOI: 10.1109/TMI.2025.3587733
Zheang Huai;Hui Tang;Yi Li;Zhuangzhuang Chen;Xiaomeng Li
Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain with only the source model and unlabeled target data. Inspired by the recent success of Segment Anything Model (SAM) which exhibits the generality of segmenting images of various modalities and in different domains given human-annotated prompts like bounding boxes or points, we for the first time explore the potentials of Segment Anything Model for SFDA via automatedly finding an accurate bounding box prompt. We find that the bounding boxes directly generated with existing SFDA approaches are defective due to the domain gap. To tackle this issue, we propose a novel Dual Feature Guided (DFG) auto-prompting approach to search for the box prompt. Specifically, the source model is first trained in a feature aggregation phase, which not only preliminarily adapts the source model to the target domain but also builds a feature distribution well-prepared for box prompt search. In the second phase, based on two feature distribution observations, we gradually expand the box prompt with the guidance of the target model feature and the SAM feature to handle the class-wise clustered target features and the class-wise dispersed target features, respectively. To remove the potentially enlarged false positive regions caused by the over-confident prediction of the target model, the refined pseudo-labels produced by SAM are further postprocessed based on connectivity analysis. Experiments on 3D and 2D datasets indicate that our approach yields superior performance compared to conventional methods. Code is available at https://github.com/xmed-lab/DFG.
无源域自适应(source -free domain adaptation, SFDA)分割的目的是使在源域中训练好的模型在只有源模型和未标记的目标数据的情况下在目标域中表现良好。受到最近成功的Segment Anything Model (SAM)的启发,我们首次通过自动找到准确的边界框提示来探索Segment Anything Model在SFDA中的潜力。SAM展示了在给定的人类注释提示(如边界框或点)下,对各种模式和不同领域的图像进行分割的通用性。我们发现用现有的SFDA方法直接生成的边界盒由于域间隙存在缺陷。为了解决这个问题,我们提出了一种新的双特征引导(DFG)自动提示方法来搜索框提示符。具体来说,首先在特征聚合阶段对源模型进行训练,不仅使源模型初步适应目标域,而且构建了一个为框提示搜索做好准备的特征分布。第二阶段,在两次特征分布观测的基础上,在目标模型特征和SAM特征的指导下,逐步扩展框提示,分别处理类明智的聚类目标特征和类明智的分散目标特征。为了去除由于对目标模型的过度自信预测而可能增大的假阳性区域,对由SAM生成的精细伪标签进行基于连通性分析的进一步后处理。在3D和2D数据集上的实验表明,与传统方法相比,我们的方法具有更好的性能。代码可从https://github.com/xmed-lab/DFG获得。
{"title":"Leveraging Segment Anything Model for Source-Free Domain Adaptation via Dual Feature Guided Auto-Prompting","authors":"Zheang Huai;Hui Tang;Yi Li;Zhuangzhuang Chen;Xiaomeng Li","doi":"10.1109/TMI.2025.3587733","DOIUrl":"10.1109/TMI.2025.3587733","url":null,"abstract":"Source-free domain adaptation (SFDA) for segmentation aims at adapting a model trained in the source domain to perform well in the target domain with only the source model and unlabeled target data. Inspired by the recent success of Segment Anything Model (SAM) which exhibits the generality of segmenting images of various modalities and in different domains given human-annotated prompts like bounding boxes or points, we for the first time explore the potentials of Segment Anything Model for SFDA via automatedly finding an accurate bounding box prompt. We find that the bounding boxes directly generated with existing SFDA approaches are defective due to the domain gap. To tackle this issue, we propose a novel Dual Feature Guided (DFG) auto-prompting approach to search for the box prompt. Specifically, the source model is first trained in a feature aggregation phase, which not only preliminarily adapts the source model to the target domain but also builds a feature distribution well-prepared for box prompt search. In the second phase, based on two feature distribution observations, we gradually expand the box prompt with the guidance of the target model feature and the SAM feature to handle the class-wise clustered target features and the class-wise dispersed target features, respectively. To remove the potentially enlarged false positive regions caused by the over-confident prediction of the target model, the refined pseudo-labels produced by SAM are further postprocessed based on connectivity analysis. Experiments on 3D and 2D datasets indicate that our approach yields superior performance compared to conventional methods. Code is available at <uri>https://github.com/xmed-lab/DFG</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5077-5088"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colorectal cancer (CRC) is a significant global health concern, and early detection through screening plays a critical role in reducing mortality. While deep learning models have shown promise in improving polyp detection, classification, and segmentation, their generalization across diverse clinical environments, particularly with out-of-distribution (OOD) data, remains a challenge. Multi-center datasets like PolypGen have been developed to address these issues, but their collection is costly and time-consuming. Traditional data augmentation techniques provide limited variability, failing to capture the complexity of medical images. Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context. To overcome these limitations, we propose a Progressive Spectrum Diffusion Model (PSDM) that integrates diverse clinical annotations–such as segmentation masks, bounding boxes, and colonoscopy reports–by transforming them into compositional prompts. These prompts are organized into coarse and fine components, allowing the model to capture both broad spatial structures and fine details, generating clinically accurate synthetic images. By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation. For instance, on the PolypGen dataset, PSDM increases the F1 score by 2.12% and the mean average precision by 3.09%, demonstrating superior performance in OOD scenarios and enhanced generalization.
{"title":"Robust Polyp Detection and Diagnosis Through Compositional Prompt-Guided Diffusion Models","authors":"Jia Yu;Yan Zhu;Peiyao Fu;Tianyi Chen;Junbo Huang;Quanlin Li;Pinghong Zhou;Zhihua Wang;Fei Wu;Shuo Wang;Xian Yang","doi":"10.1109/TMI.2025.3589456","DOIUrl":"10.1109/TMI.2025.3589456","url":null,"abstract":"Colorectal cancer (CRC) is a significant global health concern, and early detection through screening plays a critical role in reducing mortality. While deep learning models have shown promise in improving polyp detection, classification, and segmentation, their generalization across diverse clinical environments, particularly with out-of-distribution (OOD) data, remains a challenge. Multi-center datasets like PolypGen have been developed to address these issues, but their collection is costly and time-consuming. Traditional data augmentation techniques provide limited variability, failing to capture the complexity of medical images. Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context. To overcome these limitations, we propose a Progressive Spectrum Diffusion Model (PSDM) that integrates diverse clinical annotations–such as segmentation masks, bounding boxes, and colonoscopy reports–by transforming them into compositional prompts. These prompts are organized into coarse and fine components, allowing the model to capture both broad spatial structures and fine details, generating clinically accurate synthetic images. By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation. For instance, on the PolypGen dataset, PSDM increases the F1 score by 2.12% and the mean average precision by 3.09%, demonstrating superior performance in OOD scenarios and enhanced generalization.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5245-5257"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080481","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present UNet-DeformSA and TransDeformer: novel attention-based deep neural networks that reconstruct the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate tokenized image features and tokenized shape features to predict the displacements of the points on a shape template. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our networks generate artifact-free geometry outputs, and the variant of TransDeformer can predict the errors of a reconstructed geometry. Our code is available at https://github.com/linchenq/TransDeformer-Mesh.
{"title":"Attention-Based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine From MR Images","authors":"Linchen Qian;Jiasong Chen;Linhai Ma;Timur Urakov;Weiyong Gu;Liang Liang","doi":"10.1109/TMI.2025.3588831","DOIUrl":"10.1109/TMI.2025.3588831","url":null,"abstract":"Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present UNet-DeformSA and TransDeformer: novel attention-based deep neural networks that reconstruct the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate tokenized image features and tokenized shape features to predict the displacements of the points on a shape template. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our networks generate artifact-free geometry outputs, and the variant of TransDeformer can predict the errors of a reconstructed geometry. Our code is available at <uri>https://github.com/linchenq/TransDeformer-Mesh</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5258-5277"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Positron Emission Tomography (PET) is a valuable imaging method for studying molecular-level processes in the body, such as hyperphosphorylated tau (p-tau) protein aggregates, a hallmark of several neurodegenerative diseases including Alzheimer’s disease. P-tau density and cerebral perfusion can be quantified from dynamic PET images using tracer kinetic modeling techniques. However, noise in PET images leads to uncertainty in the estimated kinetic parameters, which can be quantified by estimating the posterior distribution of kinetic parameters using Bayesian inference (BI). Markov Chain Monte Carlo (MCMC) techniques are commonly used for posterior estimation but with significant computational needs. This work proposes an Improved Denoising Diffusion Probabilistic Model (iDDPM)-based method to estimate the posterior distribution of kinetic parameters in dynamic PET, leveraging the high computational efficiency of deep learning. The performance of the proposed method was evaluated on a [18F]MK6240 study and compared to a Conditional Variational Autoencoder with dual decoder (CVAE-DD)-based method and a Wasserstein GAN with gradient penalty (WGAN-GP)-based method. Posterior distributions inferred from Metropolis-Hasting MCMC were used as reference. Our approach consistently outperformed the CVAE-DD and WGAN-GP methods and offered significant reduction in computation time than the MCMC method (over 230 times faster), inferring accurate ($lt {0}.{67},%$ mean error) and precise ($lt {7}.{23},%$ standard deviation error) posterior distributions.
正电子发射断层扫描(PET)是研究体内分子水平过程的一种有价值的成像方法,例如过度磷酸化的tau (p-tau)蛋白聚集体,这是包括阿尔茨海默病在内的几种神经退行性疾病的标志。P-tau密度和脑灌注可以使用示踪动力学建模技术从动态PET图像中量化。然而,PET图像中的噪声导致了估计的动力学参数的不确定性,这可以通过使用贝叶斯推理(BI)估计动力学参数的后验分布来量化。马尔可夫链蒙特卡罗(MCMC)技术通常用于后验估计,但计算量很大。本文提出了一种基于改进的去噪扩散概率模型(iDDPM)的方法来估计动态PET中动力学参数的后验分布,利用深度学习的高计算效率。在一项[18F]MK6240研究中评估了该方法的性能,并将其与基于双解码器的条件变分自编码器(CVAE-DD)方法和基于梯度惩罚的Wasserstein GAN (WGAN-GP)方法进行了比较。以Metropolis-Hasting MCMC推断的后验分布为参考。我们的方法始终优于CVAE-DD和WGAN-GP方法,并且比MCMC方法显著减少了计算时间(超过230倍),推断准确($lt{0})。{67},%$平均误差)和精确($lt{7}。{23},%$标准差误差)后验分布。
{"title":"Bayesian Posterior Distribution Estimation of Kinetic Parameters in Dynamic Brain PET Using Generative Deep Learning Models","authors":"Yanis Djebra;Xiaofeng Liu;Thibault Marin;Amal Tiss;Maeva Dhaynaut;Nicolas Guehl;Keith Johnson;Georges El Fakhri;Chao Ma;Jinsong Ouyang","doi":"10.1109/TMI.2025.3588859","DOIUrl":"10.1109/TMI.2025.3588859","url":null,"abstract":"Positron Emission Tomography (PET) is a valuable imaging method for studying molecular-level processes in the body, such as hyperphosphorylated tau (p-tau) protein aggregates, a hallmark of several neurodegenerative diseases including Alzheimer’s disease. P-tau density and cerebral perfusion can be quantified from dynamic PET images using tracer kinetic modeling techniques. However, noise in PET images leads to uncertainty in the estimated kinetic parameters, which can be quantified by estimating the posterior distribution of kinetic parameters using Bayesian inference (BI). Markov Chain Monte Carlo (MCMC) techniques are commonly used for posterior estimation but with significant computational needs. This work proposes an Improved Denoising Diffusion Probabilistic Model (iDDPM)-based method to estimate the posterior distribution of kinetic parameters in dynamic PET, leveraging the high computational efficiency of deep learning. The performance of the proposed method was evaluated on a [18F]MK6240 study and compared to a Conditional Variational Autoencoder with dual decoder (CVAE-DD)-based method and a Wasserstein GAN with gradient penalty (WGAN-GP)-based method. Posterior distributions inferred from Metropolis-Hasting MCMC were used as reference. Our approach consistently outperformed the CVAE-DD and WGAN-GP methods and offered significant reduction in computation time than the MCMC method (over 230 times faster), inferring accurate (<inline-formula> <tex-math>$lt {0}.{67},%$ </tex-math></inline-formula> mean error) and precise (<inline-formula> <tex-math>$lt {7}.{23},%$ </tex-math></inline-formula> standard deviation error) posterior distributions.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5089-5102"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3589058
Kai Han;Shuhui Wang;Jun Chen;Chengxuan Qian;Chongwen Lyu;Siqi Ma;Chengjian Qiu;Victor S. Sheng;Qingming Huang;Zhe Liu
The success of deep learning in 3D medical image segmentation hinges on training with a large dataset of fully annotated 3D volumes, which are difficult and time-consuming to acquire. Although recent foundation models (e.g., segment anything model, SAM) can utilize sparse annotations to reduce annotation costs, segmentation tasks involving organs and tissues with blurred boundaries remain challenging. To address this issue, we propose a region uncertainty estimation framework for Computed Tomography (CT) image segmentation using noisy labels. Specifically, we propose a sample-stratified training strategy that stratifies samples according to their varying quality labels, prioritizing confident and fine-grained information at each training stage. This sample-to-voxel level processing enables more reliable supervision information to propagate to noisy label data, thus effectively mitigating the impact of noisy annotations. Moreover, we further design a boundary-guided regional uncertainty estimation module that adapts sample hierarchical training to assist in evaluating sample confidence. Experiments conducted across multiple CT datasets demonstrate the superiority of our proposed method over several competitive approaches under various noise conditions. Our proposed reliable label propagation strategy not only significantly reduces the cost of medical image annotation and robust model training but also improves the segmentation performance in scenarios with imperfect annotations, thus paving the way towards the application of medical segmentation foundation models under low-resource and remote scenarios. Code will be available at https://github.com/KHan-UJS/NoisyLabel
{"title":"Region Uncertainty Estimation for Medical Image Segmentation With Noisy Labels","authors":"Kai Han;Shuhui Wang;Jun Chen;Chengxuan Qian;Chongwen Lyu;Siqi Ma;Chengjian Qiu;Victor S. Sheng;Qingming Huang;Zhe Liu","doi":"10.1109/TMI.2025.3589058","DOIUrl":"10.1109/TMI.2025.3589058","url":null,"abstract":"The success of deep learning in 3D medical image segmentation hinges on training with a large dataset of fully annotated 3D volumes, which are difficult and time-consuming to acquire. Although recent foundation models (e.g., segment anything model, SAM) can utilize sparse annotations to reduce annotation costs, segmentation tasks involving organs and tissues with blurred boundaries remain challenging. To address this issue, we propose a region uncertainty estimation framework for Computed Tomography (CT) image segmentation using noisy labels. Specifically, we propose a sample-stratified training strategy that stratifies samples according to their varying quality labels, prioritizing confident and fine-grained information at each training stage. This sample-to-voxel level processing enables more reliable supervision information to propagate to noisy label data, thus effectively mitigating the impact of noisy annotations. Moreover, we further design a boundary-guided regional uncertainty estimation module that adapts sample hierarchical training to assist in evaluating sample confidence. Experiments conducted across multiple CT datasets demonstrate the superiority of our proposed method over several competitive approaches under various noise conditions. Our proposed reliable label propagation strategy not only significantly reduces the cost of medical image annotation and robust model training but also improves the segmentation performance in scenarios with imperfect annotations, thus paving the way towards the application of medical segmentation foundation models under low-resource and remote scenarios. Code will be available at <uri>https://github.com/KHan-UJS/NoisyLabel</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5197-5207"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588503
Alpha Alimamy Kamara;Shiwen He;Abdul Joseph Fofanah;Rong Xu;Yuehan Chen
Colorectal cancer (CRC) is the most common malignant neoplasm in the digestive system and a primary cause of cancer-related mortality in the United States, exceeded only by lung and prostate cancers. The American Cancer Society estimates that in 2024, there will be approximately 152,810 new cases of colorectal cancer and 53,010 deaths in the United States, highlighting the critical need for early diagnosis and prevention. Precise polyp segmentation is crucial for early detection, as it improves treatability and survival rates. However, existing methods, such as the UNet architecture, struggle to capture long-range dependencies and manage the variability in polyp shapes and sizes, and the low contrast between polyps and the surrounding background. We propose a multiscale dynamic polyp-focus network (MDPNet) to solve these problems. It has three modules: dynamic polyp-focus (DPfocus), non-local multiscale attention pooling (NMAP), and learnable multiscale attention pooling (LMAP). DPfocus captures global pixel-to-polyp dependencies, preserving high-level semantics and emphasizing polyp-specific regions. NMAP stabilizes the model under varying polyp shapes, sizes, and contrasts by dynamically aggregating multiscale features with minimal data loss. LMAP enhances spatial representation by learning multiscale attention across different regions. This enables MDPNet to understand long-range dependencies and combine information from different levels of context, boosting the segmentation accuracy. Extensive experiments on four publicly available datasets demonstrate that MDPNet is effective and outperforms current state-of-the-art segmentation methods by 2–5% in overall accuracy across all datasets. This demonstrates that our method improves polyp segmentation accuracy, aiding early detection and treatment of colorectal cancer.
{"title":"MDPNet: Multiscale Dynamic Polyp-Focus Network for Enhancing Medical Image Polyp Segmentation","authors":"Alpha Alimamy Kamara;Shiwen He;Abdul Joseph Fofanah;Rong Xu;Yuehan Chen","doi":"10.1109/TMI.2025.3588503","DOIUrl":"10.1109/TMI.2025.3588503","url":null,"abstract":"Colorectal cancer (CRC) is the most common malignant neoplasm in the digestive system and a primary cause of cancer-related mortality in the United States, exceeded only by lung and prostate cancers. The American Cancer Society estimates that in 2024, there will be approximately 152,810 new cases of colorectal cancer and 53,010 deaths in the United States, highlighting the critical need for early diagnosis and prevention. Precise polyp segmentation is crucial for early detection, as it improves treatability and survival rates. However, existing methods, such as the UNet architecture, struggle to capture long-range dependencies and manage the variability in polyp shapes and sizes, and the low contrast between polyps and the surrounding background. We propose a multiscale dynamic polyp-focus network (MDPNet) to solve these problems. It has three modules: dynamic polyp-focus (DPfocus), non-local multiscale attention pooling (NMAP), and learnable multiscale attention pooling (LMAP). DPfocus captures global pixel-to-polyp dependencies, preserving high-level semantics and emphasizing polyp-specific regions. NMAP stabilizes the model under varying polyp shapes, sizes, and contrasts by dynamically aggregating multiscale features with minimal data loss. LMAP enhances spatial representation by learning multiscale attention across different regions. This enables MDPNet to understand long-range dependencies and combine information from different levels of context, boosting the segmentation accuracy. Extensive experiments on four publicly available datasets demonstrate that MDPNet is effective and outperforms current state-of-the-art segmentation methods by 2–5% in overall accuracy across all datasets. This demonstrates that our method improves polyp segmentation accuracy, aiding early detection and treatment of colorectal cancer.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5208-5220"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144630145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588836
Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang
Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at https://github.com/DeepMed-Lab-ECNU/CiMP/
{"title":"Clinical Stage Prompt Induced Multi-Modal Prognosis","authors":"Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang","doi":"10.1109/TMI.2025.3588836","DOIUrl":"10.1109/TMI.2025.3588836","url":null,"abstract":"Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at <uri>https://github.com/DeepMed-Lab-ECNU/CiMP/</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5065-5076"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588789
Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma
Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.
{"title":"Self-Supervised Upsampling for Reconstructions With Generalized Enhancement in Photoacoustic Computed Tomography","authors":"Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma","doi":"10.1109/TMI.2025.3588789","DOIUrl":"10.1109/TMI.2025.3588789","url":null,"abstract":"Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5117-5127"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}