Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present UNet-DeformSA and TransDeformer: novel attention-based deep neural networks that reconstruct the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate tokenized image features and tokenized shape features to predict the displacements of the points on a shape template. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our networks generate artifact-free geometry outputs, and the variant of TransDeformer can predict the errors of a reconstructed geometry. Our code is available at https://github.com/linchenq/TransDeformer-Mesh.
{"title":"Attention-Based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine From MR Images","authors":"Linchen Qian;Jiasong Chen;Linhai Ma;Timur Urakov;Weiyong Gu;Liang Liang","doi":"10.1109/TMI.2025.3588831","DOIUrl":"10.1109/TMI.2025.3588831","url":null,"abstract":"Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present UNet-DeformSA and TransDeformer: novel attention-based deep neural networks that reconstruct the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate tokenized image features and tokenized shape features to predict the displacements of the points on a shape template. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our networks generate artifact-free geometry outputs, and the variant of TransDeformer can predict the errors of a reconstructed geometry. Our code is available at <uri>https://github.com/linchenq/TransDeformer-Mesh</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5258-5277"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Positron Emission Tomography (PET) is a valuable imaging method for studying molecular-level processes in the body, such as hyperphosphorylated tau (p-tau) protein aggregates, a hallmark of several neurodegenerative diseases including Alzheimer’s disease. P-tau density and cerebral perfusion can be quantified from dynamic PET images using tracer kinetic modeling techniques. However, noise in PET images leads to uncertainty in the estimated kinetic parameters, which can be quantified by estimating the posterior distribution of kinetic parameters using Bayesian inference (BI). Markov Chain Monte Carlo (MCMC) techniques are commonly used for posterior estimation but with significant computational needs. This work proposes an Improved Denoising Diffusion Probabilistic Model (iDDPM)-based method to estimate the posterior distribution of kinetic parameters in dynamic PET, leveraging the high computational efficiency of deep learning. The performance of the proposed method was evaluated on a [18F]MK6240 study and compared to a Conditional Variational Autoencoder with dual decoder (CVAE-DD)-based method and a Wasserstein GAN with gradient penalty (WGAN-GP)-based method. Posterior distributions inferred from Metropolis-Hasting MCMC were used as reference. Our approach consistently outperformed the CVAE-DD and WGAN-GP methods and offered significant reduction in computation time than the MCMC method (over 230 times faster), inferring accurate ($lt {0}.{67},%$ mean error) and precise ($lt {7}.{23},%$ standard deviation error) posterior distributions.
正电子发射断层扫描(PET)是研究体内分子水平过程的一种有价值的成像方法,例如过度磷酸化的tau (p-tau)蛋白聚集体,这是包括阿尔茨海默病在内的几种神经退行性疾病的标志。P-tau密度和脑灌注可以使用示踪动力学建模技术从动态PET图像中量化。然而,PET图像中的噪声导致了估计的动力学参数的不确定性,这可以通过使用贝叶斯推理(BI)估计动力学参数的后验分布来量化。马尔可夫链蒙特卡罗(MCMC)技术通常用于后验估计,但计算量很大。本文提出了一种基于改进的去噪扩散概率模型(iDDPM)的方法来估计动态PET中动力学参数的后验分布,利用深度学习的高计算效率。在一项[18F]MK6240研究中评估了该方法的性能,并将其与基于双解码器的条件变分自编码器(CVAE-DD)方法和基于梯度惩罚的Wasserstein GAN (WGAN-GP)方法进行了比较。以Metropolis-Hasting MCMC推断的后验分布为参考。我们的方法始终优于CVAE-DD和WGAN-GP方法,并且比MCMC方法显著减少了计算时间(超过230倍),推断准确($lt{0})。{67},%$平均误差)和精确($lt{7}。{23},%$标准差误差)后验分布。
{"title":"Bayesian Posterior Distribution Estimation of Kinetic Parameters in Dynamic Brain PET Using Generative Deep Learning Models","authors":"Yanis Djebra;Xiaofeng Liu;Thibault Marin;Amal Tiss;Maeva Dhaynaut;Nicolas Guehl;Keith Johnson;Georges El Fakhri;Chao Ma;Jinsong Ouyang","doi":"10.1109/TMI.2025.3588859","DOIUrl":"10.1109/TMI.2025.3588859","url":null,"abstract":"Positron Emission Tomography (PET) is a valuable imaging method for studying molecular-level processes in the body, such as hyperphosphorylated tau (p-tau) protein aggregates, a hallmark of several neurodegenerative diseases including Alzheimer’s disease. P-tau density and cerebral perfusion can be quantified from dynamic PET images using tracer kinetic modeling techniques. However, noise in PET images leads to uncertainty in the estimated kinetic parameters, which can be quantified by estimating the posterior distribution of kinetic parameters using Bayesian inference (BI). Markov Chain Monte Carlo (MCMC) techniques are commonly used for posterior estimation but with significant computational needs. This work proposes an Improved Denoising Diffusion Probabilistic Model (iDDPM)-based method to estimate the posterior distribution of kinetic parameters in dynamic PET, leveraging the high computational efficiency of deep learning. The performance of the proposed method was evaluated on a [18F]MK6240 study and compared to a Conditional Variational Autoencoder with dual decoder (CVAE-DD)-based method and a Wasserstein GAN with gradient penalty (WGAN-GP)-based method. Posterior distributions inferred from Metropolis-Hasting MCMC were used as reference. Our approach consistently outperformed the CVAE-DD and WGAN-GP methods and offered significant reduction in computation time than the MCMC method (over 230 times faster), inferring accurate (<inline-formula> <tex-math>$lt {0}.{67},%$ </tex-math></inline-formula> mean error) and precise (<inline-formula> <tex-math>$lt {7}.{23},%$ </tex-math></inline-formula> standard deviation error) posterior distributions.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5089-5102"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3589058
Kai Han;Shuhui Wang;Jun Chen;Chengxuan Qian;Chongwen Lyu;Siqi Ma;Chengjian Qiu;Victor S. Sheng;Qingming Huang;Zhe Liu
The success of deep learning in 3D medical image segmentation hinges on training with a large dataset of fully annotated 3D volumes, which are difficult and time-consuming to acquire. Although recent foundation models (e.g., segment anything model, SAM) can utilize sparse annotations to reduce annotation costs, segmentation tasks involving organs and tissues with blurred boundaries remain challenging. To address this issue, we propose a region uncertainty estimation framework for Computed Tomography (CT) image segmentation using noisy labels. Specifically, we propose a sample-stratified training strategy that stratifies samples according to their varying quality labels, prioritizing confident and fine-grained information at each training stage. This sample-to-voxel level processing enables more reliable supervision information to propagate to noisy label data, thus effectively mitigating the impact of noisy annotations. Moreover, we further design a boundary-guided regional uncertainty estimation module that adapts sample hierarchical training to assist in evaluating sample confidence. Experiments conducted across multiple CT datasets demonstrate the superiority of our proposed method over several competitive approaches under various noise conditions. Our proposed reliable label propagation strategy not only significantly reduces the cost of medical image annotation and robust model training but also improves the segmentation performance in scenarios with imperfect annotations, thus paving the way towards the application of medical segmentation foundation models under low-resource and remote scenarios. Code will be available at https://github.com/KHan-UJS/NoisyLabel
{"title":"Region Uncertainty Estimation for Medical Image Segmentation With Noisy Labels","authors":"Kai Han;Shuhui Wang;Jun Chen;Chengxuan Qian;Chongwen Lyu;Siqi Ma;Chengjian Qiu;Victor S. Sheng;Qingming Huang;Zhe Liu","doi":"10.1109/TMI.2025.3589058","DOIUrl":"10.1109/TMI.2025.3589058","url":null,"abstract":"The success of deep learning in 3D medical image segmentation hinges on training with a large dataset of fully annotated 3D volumes, which are difficult and time-consuming to acquire. Although recent foundation models (e.g., segment anything model, SAM) can utilize sparse annotations to reduce annotation costs, segmentation tasks involving organs and tissues with blurred boundaries remain challenging. To address this issue, we propose a region uncertainty estimation framework for Computed Tomography (CT) image segmentation using noisy labels. Specifically, we propose a sample-stratified training strategy that stratifies samples according to their varying quality labels, prioritizing confident and fine-grained information at each training stage. This sample-to-voxel level processing enables more reliable supervision information to propagate to noisy label data, thus effectively mitigating the impact of noisy annotations. Moreover, we further design a boundary-guided regional uncertainty estimation module that adapts sample hierarchical training to assist in evaluating sample confidence. Experiments conducted across multiple CT datasets demonstrate the superiority of our proposed method over several competitive approaches under various noise conditions. Our proposed reliable label propagation strategy not only significantly reduces the cost of medical image annotation and robust model training but also improves the segmentation performance in scenarios with imperfect annotations, thus paving the way towards the application of medical segmentation foundation models under low-resource and remote scenarios. Code will be available at <uri>https://github.com/KHan-UJS/NoisyLabel</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5197-5207"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588503
Alpha Alimamy Kamara;Shiwen He;Abdul Joseph Fofanah;Rong Xu;Yuehan Chen
Colorectal cancer (CRC) is the most common malignant neoplasm in the digestive system and a primary cause of cancer-related mortality in the United States, exceeded only by lung and prostate cancers. The American Cancer Society estimates that in 2024, there will be approximately 152,810 new cases of colorectal cancer and 53,010 deaths in the United States, highlighting the critical need for early diagnosis and prevention. Precise polyp segmentation is crucial for early detection, as it improves treatability and survival rates. However, existing methods, such as the UNet architecture, struggle to capture long-range dependencies and manage the variability in polyp shapes and sizes, and the low contrast between polyps and the surrounding background. We propose a multiscale dynamic polyp-focus network (MDPNet) to solve these problems. It has three modules: dynamic polyp-focus (DPfocus), non-local multiscale attention pooling (NMAP), and learnable multiscale attention pooling (LMAP). DPfocus captures global pixel-to-polyp dependencies, preserving high-level semantics and emphasizing polyp-specific regions. NMAP stabilizes the model under varying polyp shapes, sizes, and contrasts by dynamically aggregating multiscale features with minimal data loss. LMAP enhances spatial representation by learning multiscale attention across different regions. This enables MDPNet to understand long-range dependencies and combine information from different levels of context, boosting the segmentation accuracy. Extensive experiments on four publicly available datasets demonstrate that MDPNet is effective and outperforms current state-of-the-art segmentation methods by 2–5% in overall accuracy across all datasets. This demonstrates that our method improves polyp segmentation accuracy, aiding early detection and treatment of colorectal cancer.
{"title":"MDPNet: Multiscale Dynamic Polyp-Focus Network for Enhancing Medical Image Polyp Segmentation","authors":"Alpha Alimamy Kamara;Shiwen He;Abdul Joseph Fofanah;Rong Xu;Yuehan Chen","doi":"10.1109/TMI.2025.3588503","DOIUrl":"10.1109/TMI.2025.3588503","url":null,"abstract":"Colorectal cancer (CRC) is the most common malignant neoplasm in the digestive system and a primary cause of cancer-related mortality in the United States, exceeded only by lung and prostate cancers. The American Cancer Society estimates that in 2024, there will be approximately 152,810 new cases of colorectal cancer and 53,010 deaths in the United States, highlighting the critical need for early diagnosis and prevention. Precise polyp segmentation is crucial for early detection, as it improves treatability and survival rates. However, existing methods, such as the UNet architecture, struggle to capture long-range dependencies and manage the variability in polyp shapes and sizes, and the low contrast between polyps and the surrounding background. We propose a multiscale dynamic polyp-focus network (MDPNet) to solve these problems. It has three modules: dynamic polyp-focus (DPfocus), non-local multiscale attention pooling (NMAP), and learnable multiscale attention pooling (LMAP). DPfocus captures global pixel-to-polyp dependencies, preserving high-level semantics and emphasizing polyp-specific regions. NMAP stabilizes the model under varying polyp shapes, sizes, and contrasts by dynamically aggregating multiscale features with minimal data loss. LMAP enhances spatial representation by learning multiscale attention across different regions. This enables MDPNet to understand long-range dependencies and combine information from different levels of context, boosting the segmentation accuracy. Extensive experiments on four publicly available datasets demonstrate that MDPNet is effective and outperforms current state-of-the-art segmentation methods by 2–5% in overall accuracy across all datasets. This demonstrates that our method improves polyp segmentation accuracy, aiding early detection and treatment of colorectal cancer.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5208-5220"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144630145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588836
Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang
Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at https://github.com/DeepMed-Lab-ECNU/CiMP/
{"title":"Clinical Stage Prompt Induced Multi-Modal Prognosis","authors":"Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang","doi":"10.1109/TMI.2025.3588836","DOIUrl":"10.1109/TMI.2025.3588836","url":null,"abstract":"Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at <uri>https://github.com/DeepMed-Lab-ECNU/CiMP/</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5065-5076"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588789
Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma
Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.
{"title":"Self-Supervised Upsampling for Reconstructions With Generalized Enhancement in Photoacoustic Computed Tomography","authors":"Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma","doi":"10.1109/TMI.2025.3588789","DOIUrl":"10.1109/TMI.2025.3588789","url":null,"abstract":"Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5117-5127"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skin lesion segmentation is vital for the early detection, diagnosis, and treatment of melanoma, yet it remains challenging due to significant variations in lesion attributes (e.g., color, size, shape), ambiguous boundaries, and noise interference. Recent advancements have focused on capturing contextual information and incorporating boundary priors to handle challenging lesions. However, there has been limited exploration on the explicit analysis of the inherent patterns of skin lesions, a crucial aspect of the knowledge-driven decision-making process used by clinical experts. In this work, we introduce a novel approach called Probabilistic Attribute Learning (PAL), which leverages knowledge of lesion patterns to achieve enhanced performance on challenging lesions. Recognizing that the lesion patterns exhibited in each image can be properly depicted by disentangled attributes, we begin by explicitly estimating the distributions of these attributes as distinct Gaussian distributions, with mean and variance indicating the most likely pattern of that attribute and its variation. Using Monte Carlo Sampling, we iteratively draw multiple samples from these distributions to capture various potential patterns for each attribute. These samples are then merged through an effective attribute fusion technique, resulting in diverse representations that comprehensively depict the lesion class. By performing pixel-class proximity matching between each pixel-wise representation and the diverse class-wise representations, we significantly enhance the model’s robustness. Extensive experiments on two public skin lesion datasets and one unified polyp lesion dataset demonstrate the effectiveness and strong generalization ability of our method. Codes are available at https://github.com/IsYuchenYuan/PAL
{"title":"PAL: Boosting Skin Lesion Segmentation via Probabilistic Attribute Learning","authors":"Yuchen Yuan;Xi Wang;Jinpeng Li;Guangyong Chen;Pheng-Ann Heng","doi":"10.1109/TMI.2025.3588167","DOIUrl":"10.1109/TMI.2025.3588167","url":null,"abstract":"Skin lesion segmentation is vital for the early detection, diagnosis, and treatment of melanoma, yet it remains challenging due to significant variations in lesion attributes (e.g., color, size, shape), ambiguous boundaries, and noise interference. Recent advancements have focused on capturing contextual information and incorporating boundary priors to handle challenging lesions. However, there has been limited exploration on the explicit analysis of the inherent patterns of skin lesions, a crucial aspect of the knowledge-driven decision-making process used by clinical experts. In this work, we introduce a novel approach called Probabilistic Attribute Learning (PAL), which leverages knowledge of lesion patterns to achieve enhanced performance on challenging lesions. Recognizing that the lesion patterns exhibited in each image can be properly depicted by disentangled attributes, we begin by explicitly estimating the distributions of these attributes as distinct Gaussian distributions, with mean and variance indicating the most likely pattern of that attribute and its variation. Using Monte Carlo Sampling, we iteratively draw multiple samples from these distributions to capture various potential patterns for each attribute. These samples are then merged through an effective attribute fusion technique, resulting in diverse representations that comprehensively depict the lesion class. By performing pixel-class proximity matching between each pixel-wise representation and the diverse class-wise representations, we significantly enhance the model’s robustness. Extensive experiments on two public skin lesion datasets and one unified polyp lesion dataset demonstrate the effectiveness and strong generalization ability of our method. Codes are available at <uri>https://github.com/IsYuchenYuan/PAL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5183-5196"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1109/TMI.2025.3588157
Xiaolong Deng;Huisi Wu
Accurate segmentation of the left ventricle in echocardiography is critical for diagnosing and treating cardiovascular diseases. However, accurate segmentation remains challenging due to the limitations of ultrasound imaging. Although numerous image and video segmentation methods have been proposed, existing methods still fail to effectively solve this task, which is limited by sparsity annotations. To address this problem, we propose a novel semi-supervised segmentation framework named NCM-Net for echocardiography. We first propose the neighborhood correlation mining (NCM) module, which sufficiently mines the correlations between query features and their spatiotemporal neighborhoods to resist noise influence. The module also captures cross-scale contextual correlations between pixels spatially to further refine features, thus alleviating the impact of noise on echocardiography segmentation. To further improve segmentation accuracy, we propose using unreliable-pixels masked attention (UMA). By masking reliable pixels, it pays extra attention to unreliable pixels to refine the boundary of segmentation. Further, we use cross-frame boundary constraints on the final predictions to optimize their temporal consistency. Through extensive experiments on two publicly available datasets, CAMUS and EchoNet-Dynamic, we demonstrate the effectiveness of the proposed, which achieves state-of-the-art performance and outstanding temporal consistency. Codes are available at https://github.com/dengxl0520/NCMNet
{"title":"Echocardiography Video Segmentation via Neighborhood Correlation Mining","authors":"Xiaolong Deng;Huisi Wu","doi":"10.1109/TMI.2025.3588157","DOIUrl":"10.1109/TMI.2025.3588157","url":null,"abstract":"Accurate segmentation of the left ventricle in echocardiography is critical for diagnosing and treating cardiovascular diseases. However, accurate segmentation remains challenging due to the limitations of ultrasound imaging. Although numerous image and video segmentation methods have been proposed, existing methods still fail to effectively solve this task, which is limited by sparsity annotations. To address this problem, we propose a novel semi-supervised segmentation framework named NCM-Net for echocardiography. We first propose the neighborhood correlation mining (NCM) module, which sufficiently mines the correlations between query features and their spatiotemporal neighborhoods to resist noise influence. The module also captures cross-scale contextual correlations between pixels spatially to further refine features, thus alleviating the impact of noise on echocardiography segmentation. To further improve segmentation accuracy, we propose using unreliable-pixels masked attention (UMA). By masking reliable pixels, it pays extra attention to unreliable pixels to refine the boundary of segmentation. Further, we use cross-frame boundary constraints on the final predictions to optimize their temporal consistency. Through extensive experiments on two publicly available datasets, CAMUS and EchoNet-Dynamic, we demonstrate the effectiveness of the proposed, which achieves state-of-the-art performance and outstanding temporal consistency. Codes are available at <uri>https://github.com/dengxl0520/NCMNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5172-5182"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1109/TMI.2025.3587636
Jihun Kang;Eui Cheol Jung;Hyun Jung Koo;Dong Hyun Yang;Hojin Ha
In this study, we present enhanced physics-informed neural networks (PINNs), which were designed to address flow field errors in four-dimensional flow magnetic resonance imaging (4D Flow MRI). Flow field errors, typically occurring in high-velocity regions, lead to inaccuracies in velocity fields and flow rate underestimation. We proposed incorporating flow rate constraints to ensure physical consistency across cross-sections. The proposed framework included optimization strategies to improve convergence, stability, and accuracy. Artificial viscosity modeling, projecting conflicting gradients (PCGrad), and Euclidean norm scaling were applied to balance loss functions during training. The performance was validated using 2D computational fluid dynamics (CFD) with synthetic error, in-vitro 4D flow MRI mimicking aortic valve, and in-vivo 4D flow MRI from patients with aortic regurgitation and aortic stenosis. This study demonstrated considerable improvements in correcting flow field errors, denoising, and super-resolution. Notably, the proposed PINNs provided accurate flow rate reconstruction in stenotic and high-velocity regions. This approach extends the applicability of 4D flow MRI by providing reliable hemodynamics in the post-processing stage.
{"title":"Flow-Rate-Constrained Physics-Informed Neural Networks for Flow Field Error Correction in 4-D Flow Magnetic Resonance Imaging","authors":"Jihun Kang;Eui Cheol Jung;Hyun Jung Koo;Dong Hyun Yang;Hojin Ha","doi":"10.1109/TMI.2025.3587636","DOIUrl":"10.1109/TMI.2025.3587636","url":null,"abstract":"In this study, we present enhanced physics-informed neural networks (PINNs), which were designed to address flow field errors in four-dimensional flow magnetic resonance imaging (4D Flow MRI). Flow field errors, typically occurring in high-velocity regions, lead to inaccuracies in velocity fields and flow rate underestimation. We proposed incorporating flow rate constraints to ensure physical consistency across cross-sections. The proposed framework included optimization strategies to improve convergence, stability, and accuracy. Artificial viscosity modeling, projecting conflicting gradients (PCGrad), and Euclidean norm scaling were applied to balance loss functions during training. The performance was validated using 2D computational fluid dynamics (CFD) with synthetic error, in-vitro 4D flow MRI mimicking aortic valve, and in-vivo 4D flow MRI from patients with aortic regurgitation and aortic stenosis. This study demonstrated considerable improvements in correcting flow field errors, denoising, and super-resolution. Notably, the proposed PINNs provided accurate flow rate reconstruction in stenotic and high-velocity regions. This approach extends the applicability of 4D flow MRI by providing reliable hemodynamics in the post-processing stage.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5155-5171"},"PeriodicalIF":0.0,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144603140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metal dental implants may introduce metal artifacts (MA) during the CBCT imaging process, causing significant interference in subsequent diagnosis. In recent years, many deep learning methods for metal artifact reduction (MAR) have been proposed. Due to the huge difference between synthetic and clinical MA, supervised learning MAR methods may perform poorly in clinical settings. Many existing unsupervised MAR methods trained on clinical data often suffer from incorrect dental morphology. To alleviate the above problems, in this paper, we propose a new MAR method of Coupled Diffusion Models (CDM) for clinical dental CBCT images. Specifically, we separately train two diffusion models on clinical MA-degraded images and clinical clean images to obtain prior information, respectively. During the denoising process, the variances of noise levels are calculated from MA images and the prior of diffusion models. Then we develop a noise transformation module between the two diffusion models to transform the MA noise image into a new initial value for the denoising process. Our designs effectively exploit the inherent transformation between the misaligned MA-degraded images and clean images. Additionally, we introduce an MA-adaptive inference technique to better accommodate the MA degradation in different areas of an MA-degraded image. Experiments on our clinical dataset demonstrate that our CDM outperforms the comparison methods on both objective metrics and visual quality, especially for severe MA degradation. We will publicly release our code.
{"title":"Coupled Diffusion Models for Metal Artifact Reduction of Clinical Dental CBCT Images","authors":"Zhouzhuo Zhang;Juncheng Yan;Yuxuan Shi;Zhiming Cui;Jun Xu;Dinggang Shen","doi":"10.1109/TMI.2025.3587131","DOIUrl":"10.1109/TMI.2025.3587131","url":null,"abstract":"Metal dental implants may introduce metal artifacts (MA) during the CBCT imaging process, causing significant interference in subsequent diagnosis. In recent years, many deep learning methods for metal artifact reduction (MAR) have been proposed. Due to the huge difference between synthetic and clinical MA, supervised learning MAR methods may perform poorly in clinical settings. Many existing unsupervised MAR methods trained on clinical data often suffer from incorrect dental morphology. To alleviate the above problems, in this paper, we propose a new MAR method of Coupled Diffusion Models (CDM) for clinical dental CBCT images. Specifically, we separately train two diffusion models on clinical MA-degraded images and clinical clean images to obtain prior information, respectively. During the denoising process, the variances of noise levels are calculated from MA images and the prior of diffusion models. Then we develop a noise transformation module between the two diffusion models to transform the MA noise image into a new initial value for the denoising process. Our designs effectively exploit the inherent transformation between the misaligned MA-degraded images and clean images. Additionally, we introduce an MA-adaptive inference technique to better accommodate the MA degradation in different areas of an MA-degraded image. Experiments on our clinical dataset demonstrate that our CDM outperforms the comparison methods on both objective metrics and visual quality, especially for severe MA degradation. We will publicly release our code.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5103-5116"},"PeriodicalIF":0.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}