Pub Date : 2026-02-06DOI: 10.1109/TMI.2026.3661971
Haoran Peng, Yuxiang Dai, Rencheng Zheng, Mingming Wang, Chengyan Wang, Yu Luo, He Wang
Predicting stroke outcome remains challenging due to inherent heterogeneity, misalignment of multimodal clinical data, and the availability of well-annotated longitudinal datasets. Current methodologies often lack robustness and generalizability across these tasks. We propose a few-shot contrastive learning framework that integrates brain MRI images and structured clinical records for cross-task prognosis prediction, addressing both morphological and functional outcomes. Our method combines Model-Agnostic Meta-Learning (MAML) with a two-step contrastive learning strategy including self-awareness learning that captures task-specific features and domain learning that facilitates cross-dataset generalization. To handle inconsistencies in tabular data, a Misalignment Separation technique was adopted. The framework jointly trains a domain encoder on multimodal inputs, capturing shared and task-specific prior knowledge to enhance predictive robustness. Evaluations on 309 patients for morphological outcome and 341 patients for functional outcome, as well as on external validation datasets, demonstrated that our approach outperformed SimCLR and conventional supervised methods, and could effectively integrate cross-task datasets. This framework highlights the potential of multimodal few-shot learning for robust stroke prognosis prediction for small-sample datasets.
{"title":"Few-Shot Contrastive Learning for Cross-Task Stroke Prognosis Prediction with Multimodal Data.","authors":"Haoran Peng, Yuxiang Dai, Rencheng Zheng, Mingming Wang, Chengyan Wang, Yu Luo, He Wang","doi":"10.1109/TMI.2026.3661971","DOIUrl":"https://doi.org/10.1109/TMI.2026.3661971","url":null,"abstract":"<p><p>Predicting stroke outcome remains challenging due to inherent heterogeneity, misalignment of multimodal clinical data, and the availability of well-annotated longitudinal datasets. Current methodologies often lack robustness and generalizability across these tasks. We propose a few-shot contrastive learning framework that integrates brain MRI images and structured clinical records for cross-task prognosis prediction, addressing both morphological and functional outcomes. Our method combines Model-Agnostic Meta-Learning (MAML) with a two-step contrastive learning strategy including self-awareness learning that captures task-specific features and domain learning that facilitates cross-dataset generalization. To handle inconsistencies in tabular data, a Misalignment Separation technique was adopted. The framework jointly trains a domain encoder on multimodal inputs, capturing shared and task-specific prior knowledge to enhance predictive robustness. Evaluations on 309 patients for morphological outcome and 341 patients for functional outcome, as well as on external validation datasets, demonstrated that our approach outperformed SimCLR and conventional supervised methods, and could effectively integrate cross-task datasets. This framework highlights the potential of multimodal few-shot learning for robust stroke prognosis prediction for small-sample datasets.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TMI.2026.3661433
Ka-Wai Yung, Jayaram Sivaraj, Lodovico di Giura, Simon Eaton, Paolo De Coppi, Danail Stoyanov, Stavros Loukogeorgakis, Evangelos B Mazomenos
Rapid advancements in diffusion models have enabled synthesis of realistic and anonymized imagery in radiography. However, due to their complexity, these models typically require large training volumes, often exceeding 10,000 images. Pre-training on natural images can partly mitigate this issue, but often fails to generate anatomically accurate shapes due to the significant domain gap. This prohibits applications in specialized medical conditions with limited data. We propose AnatoDiff, a diffusion model synthesizing high-quality X-Ray images with accurate anatomical shapes using only 500 to 1,000 training samples. AnatoDiff incorporates a Shape Prototype Module and Anatomical Fidelity loss, allowing for smaller training volumes through targeted supervision. We extensively validate AnatoDiff across three open-source datasets from distinct anatomical regions: Neonatal Abdomen (1,000 images); Adult Chest (500 images); and Humerus (500 images). Results demonstrate significant benefits, with an average improvement of 14.9% in Fréchet Inception Distance, 9.7% in Improved Precision, and 2.3% in Improved Recall compared to state-of-the-art (SOTA) few-shot and data-limited natural image synthesis methods. Unlike other models, AnatoDiff consistently generates anatomically correct images with accurate shapes. Additionally, a ResNet-50 classifier trained on AnatoDiff-generated images shows a 2.1% to 5.3% increase in F1-score, compared to being trained on SOTA diffusion images, across 500 to 10,000 samples. A survey with 10 medical professionals reveals that images generated by AnatoDiff are challenging to distinguish from real ones, with a Matthews correlation coefficient of 0.277 and Fleiss' Kappa of 0.126, highlighting the effectiveness of AnatoDiff in generating high-quality, anatomically accurate radiographs. Our code is available at https://github.com/KawaiYung/AnatoDiff.
{"title":"AnatoDiff: Synthesizing Anatomically Truthful Radiographs With Limited Training Images.","authors":"Ka-Wai Yung, Jayaram Sivaraj, Lodovico di Giura, Simon Eaton, Paolo De Coppi, Danail Stoyanov, Stavros Loukogeorgakis, Evangelos B Mazomenos","doi":"10.1109/TMI.2026.3661433","DOIUrl":"https://doi.org/10.1109/TMI.2026.3661433","url":null,"abstract":"<p><p>Rapid advancements in diffusion models have enabled synthesis of realistic and anonymized imagery in radiography. However, due to their complexity, these models typically require large training volumes, often exceeding 10,000 images. Pre-training on natural images can partly mitigate this issue, but often fails to generate anatomically accurate shapes due to the significant domain gap. This prohibits applications in specialized medical conditions with limited data. We propose AnatoDiff, a diffusion model synthesizing high-quality X-Ray images with accurate anatomical shapes using only 500 to 1,000 training samples. AnatoDiff incorporates a Shape Prototype Module and Anatomical Fidelity loss, allowing for smaller training volumes through targeted supervision. We extensively validate AnatoDiff across three open-source datasets from distinct anatomical regions: Neonatal Abdomen (1,000 images); Adult Chest (500 images); and Humerus (500 images). Results demonstrate significant benefits, with an average improvement of 14.9% in Fréchet Inception Distance, 9.7% in Improved Precision, and 2.3% in Improved Recall compared to state-of-the-art (SOTA) few-shot and data-limited natural image synthesis methods. Unlike other models, AnatoDiff consistently generates anatomically correct images with accurate shapes. Additionally, a ResNet-50 classifier trained on AnatoDiff-generated images shows a 2.1% to 5.3% increase in F1-score, compared to being trained on SOTA diffusion images, across 500 to 10,000 samples. A survey with 10 medical professionals reveals that images generated by AnatoDiff are challenging to distinguish from real ones, with a Matthews correlation coefficient of 0.277 and Fleiss' Kappa of 0.126, highlighting the effectiveness of AnatoDiff in generating high-quality, anatomically accurate radiographs. Our code is available at https://github.com/KawaiYung/AnatoDiff.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TMI.2026.3662001
Zijian Gao, Lai Jiang, Yichen Guo, Sukun Tian, Yuchun Sun, Mai Xu, Liyuan Tao
Segmentation of the pulmonary vessel from computed tomography (CT) images plays a crucial role in the diagnosis and treatment of various lung diseases. Although deep learning-based approaches have shown remarkable progress in recent years, their performance is often hindered by the lack of high-quality annotated datasets, in which the complex anatomy and morphology of pulmonary vessels make manual annotation challenging, time-consuming, and prone to errors. To address this, we propose PV25, the first dataset that features finely paired annotations of both pulmonary vessels and airways. Moreover, we propose TPNet, a novel tubular-aware prompt-tuning framework for pulmonary vessel segmentation under few-shot training with limited annotations. Specifically, based on an advanced and frozen segmentation backbone, TPNet proposes tunable encoding and decoding networks that learn tubular structures as transfer learning priors, bridging the gap between the source and target pulmonary vessel domains. Specifically, TPNet is built in an encoder-decoder manner, including the fixed segmentation backbone, tunable encoding and decoding networks. In encoding stage, the Morphology-Driven Region Growing (MDRG) module is developed to leverage the tubular connectivity of vessels to guide the network in capturing fine-grained features of pulmonary vessels. In decoding stage, the Cross-Correlation Guidance (CCG) module is introduced to integrate multi-scale correlations between airway and vessel structures in a coarse-to-fine manner. Extensive experiments conducted on multiple datasets demonstrate that TPNet achieves state-of-the-art performance in pulmonary vessel segmentation under limited training data. Besides, TPNet shows strong performance in related tasks such as airway segmentation and artery-vein classification, highlighting its robustness and versatility.
{"title":"Few-Shot Pulmonary Vessel Segmentation based on Tubular-Aware Prompt-Tuning.","authors":"Zijian Gao, Lai Jiang, Yichen Guo, Sukun Tian, Yuchun Sun, Mai Xu, Liyuan Tao","doi":"10.1109/TMI.2026.3662001","DOIUrl":"https://doi.org/10.1109/TMI.2026.3662001","url":null,"abstract":"<p><p>Segmentation of the pulmonary vessel from computed tomography (CT) images plays a crucial role in the diagnosis and treatment of various lung diseases. Although deep learning-based approaches have shown remarkable progress in recent years, their performance is often hindered by the lack of high-quality annotated datasets, in which the complex anatomy and morphology of pulmonary vessels make manual annotation challenging, time-consuming, and prone to errors. To address this, we propose PV25, the first dataset that features finely paired annotations of both pulmonary vessels and airways. Moreover, we propose TPNet, a novel tubular-aware prompt-tuning framework for pulmonary vessel segmentation under few-shot training with limited annotations. Specifically, based on an advanced and frozen segmentation backbone, TPNet proposes tunable encoding and decoding networks that learn tubular structures as transfer learning priors, bridging the gap between the source and target pulmonary vessel domains. Specifically, TPNet is built in an encoder-decoder manner, including the fixed segmentation backbone, tunable encoding and decoding networks. In encoding stage, the Morphology-Driven Region Growing (MDRG) module is developed to leverage the tubular connectivity of vessels to guide the network in capturing fine-grained features of pulmonary vessels. In decoding stage, the Cross-Correlation Guidance (CCG) module is introduced to integrate multi-scale correlations between airway and vessel structures in a coarse-to-fine manner. Extensive experiments conducted on multiple datasets demonstrate that TPNet achieves state-of-the-art performance in pulmonary vessel segmentation under limited training data. Besides, TPNet shows strong performance in related tasks such as airway segmentation and artery-vein classification, highlighting its robustness and versatility.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TMI.2026.3660978
Jingwen Xu, Fei Lyu, Ye Zhu, Pong C Yuen
The integration of laboratory tests and medical images is crucial in making accurate disease prediction. However, imaging data exhibits temporal sparsity, compared to frequently collected laboratory tests. This temporal sparsity limits effective multi-modal interaction, which in turn degrades the prediction accuracy.We address this issue by generating additional medical images at more time points, conditioned on the laboratory tests. Inspired by the pivotal role of organs in mediating laboratory tests and imaging abnormalities, we propose an Organ-Centric Modal-Shared Image Generator. It converts laboratory tests into imaging abnormalities through two key components: (1) Organ-Centric Graph: It positions organs as central nodes connecting laboratory tests and imaging abnormalities; and (2) Knowledge-Guided Modal-Shared Trajectory Module: It binds multi-modal features across time into a unified organ state trajectory. Experimental results demonstrate that our method improves multi-modal prediction performance across various diseases. Code is available at https://github.com/LyapunovStability/Lab_Guide_Med_Image_Gen.
{"title":"Laboratory Test-Guided Medical Image Generation for Multi-Modal Disease Prediction.","authors":"Jingwen Xu, Fei Lyu, Ye Zhu, Pong C Yuen","doi":"10.1109/TMI.2026.3660978","DOIUrl":"https://doi.org/10.1109/TMI.2026.3660978","url":null,"abstract":"<p><p>The integration of laboratory tests and medical images is crucial in making accurate disease prediction. However, imaging data exhibits temporal sparsity, compared to frequently collected laboratory tests. This temporal sparsity limits effective multi-modal interaction, which in turn degrades the prediction accuracy.We address this issue by generating additional medical images at more time points, conditioned on the laboratory tests. Inspired by the pivotal role of organs in mediating laboratory tests and imaging abnormalities, we propose an Organ-Centric Modal-Shared Image Generator. It converts laboratory tests into imaging abnormalities through two key components: (1) Organ-Centric Graph: It positions organs as central nodes connecting laboratory tests and imaging abnormalities; and (2) Knowledge-Guided Modal-Shared Trajectory Module: It binds multi-modal features across time into a unified organ state trajectory. Experimental results demonstrate that our method improves multi-modal prediction performance across various diseases. Code is available at https://github.com/LyapunovStability/Lab_Guide_Med_Image_Gen.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1109/TMI.2026.3662157
Guoqi Yu, Xiaowei Hu, Angelica I Aviles-Rivero, Anqi Qiu, Shujun Wang
Functional magnetic resonance imaging (fMRI) enables non-invasive brain disorder classification by capturing blood-oxygen-level-dependent (BOLD) signals. However, most existing methods rely on functional connectivity (FC) via Pearson correlation, which reduces 4D BOLD signals to static 2D matrices-discarding temporal dynamics and capturing only linear inter-regional relationships. In this work, we benchmark state-of-the-art temporal models (e.g., time-series models: PatchTST, TimesNet, TimeMixer) on raw BOLD signals across five public datasets. Results show these models consistently outperform traditional FC-based approaches, highlighting the value of directly modeling temporal information such as cycle-like oscillatory fluctuations and drift-like slow baseline trends. Building on this insight, we propose DeCI, a simple yet effective framework that integrates two key principles: (i) Cycle and Drift Decomposition to disentangle cycle and drift within each ROI (Region of Interest); and (ii) Channel-Independence to model each ROI separately, improving robustness and reducing overfitting. Extensive experiments demonstrate that DeCI achieves superior classification accuracy and generalization compared to both FC-based and temporal baselines. Our findings advocate for a shift toward end-to-end temporal modeling in fMRI analysis to better capture complex brain dynamics. The code is available at https://github.com/Levi-Ackman/DeCI.
{"title":"Moving Beyond Functional Connectivity: Time-Series Modeling for fMRI-Based Brain Disorder Classification.","authors":"Guoqi Yu, Xiaowei Hu, Angelica I Aviles-Rivero, Anqi Qiu, Shujun Wang","doi":"10.1109/TMI.2026.3662157","DOIUrl":"https://doi.org/10.1109/TMI.2026.3662157","url":null,"abstract":"<p><p>Functional magnetic resonance imaging (fMRI) enables non-invasive brain disorder classification by capturing blood-oxygen-level-dependent (BOLD) signals. However, most existing methods rely on functional connectivity (FC) via Pearson correlation, which reduces 4D BOLD signals to static 2D matrices-discarding temporal dynamics and capturing only linear inter-regional relationships. In this work, we benchmark state-of-the-art temporal models (e.g., time-series models: PatchTST, TimesNet, TimeMixer) on raw BOLD signals across five public datasets. Results show these models consistently outperform traditional FC-based approaches, highlighting the value of directly modeling temporal information such as cycle-like oscillatory fluctuations and drift-like slow baseline trends. Building on this insight, we propose DeCI, a simple yet effective framework that integrates two key principles: (i) Cycle and Drift Decomposition to disentangle cycle and drift within each ROI (Region of Interest); and (ii) Channel-Independence to model each ROI separately, improving robustness and reducing overfitting. Extensive experiments demonstrate that DeCI achieves superior classification accuracy and generalization compared to both FC-based and temporal baselines. Our findings advocate for a shift toward end-to-end temporal modeling in fMRI analysis to better capture complex brain dynamics. The code is available at https://github.com/Levi-Ackman/DeCI.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TMI.2025.3600327
Xiling Luo, Yi Wang, Le Ou-Yang
Accurate segmentation of ultrasound images plays a critical role in disease screening and diagnosis. Recently, neural network-based methods have garnered significant attention for their potential in improving ultrasound image segmentation. However, these methods still face significant challenges, primarily due to inherent issues in ultrasound images, such as low resolution, speckle noise, and artifacts. Additionally, ultrasound image segmentation encompasses a wide range of scenarios, including organ segmentation (e.g., cardiac and fetal head) and lesion segmentation (e.g., breast cancer and thyroid nodules), making the task highly diverse and complex. Existing methods are often designed for specific segmentation scenarios, which limits their flexibility and ability to meet the diverse needs across various scenarios. To address these challenges, we propose a novel Localized and Globalized Frequency Fusion Model (LGFFM) for ultrasound image segmentation. Specifically, we first design a Parallel Bi-Encoder (PBE) architecture that integrates Local Feature Blocks (LFB) and Global Feature Blocks (GLB) to enhance feature extraction. Additionally, we introduce a Frequency Domain Mapping Module (FDMM) to capture texture information, particularly high-frequency details such as edges. Finally, a Multi-Domain Fusion (MDF) method is developed to effectively integrate features across different domains. We conduct extensive experiments on eight representative public ultrasound datasets across four different types. The results demonstrate that LGFFM outperforms current state-of-the-art methods in both segmentation accuracy and generalization performance.
{"title":"LGFFM: A Localized and Globalized Frequency Fusion Model for Ultrasound Image Segmentation.","authors":"Xiling Luo, Yi Wang, Le Ou-Yang","doi":"10.1109/TMI.2025.3600327","DOIUrl":"10.1109/TMI.2025.3600327","url":null,"abstract":"<p><p>Accurate segmentation of ultrasound images plays a critical role in disease screening and diagnosis. Recently, neural network-based methods have garnered significant attention for their potential in improving ultrasound image segmentation. However, these methods still face significant challenges, primarily due to inherent issues in ultrasound images, such as low resolution, speckle noise, and artifacts. Additionally, ultrasound image segmentation encompasses a wide range of scenarios, including organ segmentation (e.g., cardiac and fetal head) and lesion segmentation (e.g., breast cancer and thyroid nodules), making the task highly diverse and complex. Existing methods are often designed for specific segmentation scenarios, which limits their flexibility and ability to meet the diverse needs across various scenarios. To address these challenges, we propose a novel Localized and Globalized Frequency Fusion Model (LGFFM) for ultrasound image segmentation. Specifically, we first design a Parallel Bi-Encoder (PBE) architecture that integrates Local Feature Blocks (LFB) and Global Feature Blocks (GLB) to enhance feature extraction. Additionally, we introduce a Frequency Domain Mapping Module (FDMM) to capture texture information, particularly high-frequency details such as edges. Finally, a Multi-Domain Fusion (MDF) method is developed to effectively integrate features across different domains. We conduct extensive experiments on eight representative public ultrasound datasets across four different types. The results demonstrate that LGFFM outperforms current state-of-the-art methods in both segmentation accuracy and generalization performance.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"515-527"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TMI.2025.3600253
Bingchen Gao, Jun Zhou, Jing Zou, Jing Qin
Real-time and realistic reconstruction of 3D dynamic surgical scenes from surgical videos is a novel and unique tool for surgical planning and intraoperative guidance. The 3D Gaussian splatting (GS), with its high rendering speed and reconstruction fidelity, has recently emerged as a promising technique for surgical scene reconstruction. However, existing GS-based methods still have two obvious shortcomings for realistic reconstruction. First, they largely struggle to capture localized yet intricate soft tissue deformations caused by complex instrument-tissue interactions. Second, they fail to model spatiotemporal coupling among Gaussian primitives for global adjustments during rapid perspective transformations, resulting in unstable reconstruction outputs. In this paper, we propose EndoRD-GS, an innovative approach that overcomes these two limitations through two core techniques: 1) periodic modulated Gaussian functions and 2) a new Biplane module. Specifically, our periodic modulated Gaussian functions incorporate meticulously designed modulations, significantly enhancing the representation of complex local tissue deformations. On the other hand, our Biplane module constructs spatiotemporal interactions among Gaussian primitives, enabling global adjustments and ensuring reliable scene reconstruction during rapid perspective transformations. Extensive experiments on three datasets demonstrate that our EndoRD-GS achieves superior performance in endoscopic scene reconstruction compared to state-of-the-art methods. The code is available at EndoRD-GS.
{"title":"EndoRD-GS: Robust Deformable Endoscopic Scene Reconstruction via Gaussian Splatting.","authors":"Bingchen Gao, Jun Zhou, Jing Zou, Jing Qin","doi":"10.1109/TMI.2025.3600253","DOIUrl":"10.1109/TMI.2025.3600253","url":null,"abstract":"<p><p>Real-time and realistic reconstruction of 3D dynamic surgical scenes from surgical videos is a novel and unique tool for surgical planning and intraoperative guidance. The 3D Gaussian splatting (GS), with its high rendering speed and reconstruction fidelity, has recently emerged as a promising technique for surgical scene reconstruction. However, existing GS-based methods still have two obvious shortcomings for realistic reconstruction. First, they largely struggle to capture localized yet intricate soft tissue deformations caused by complex instrument-tissue interactions. Second, they fail to model spatiotemporal coupling among Gaussian primitives for global adjustments during rapid perspective transformations, resulting in unstable reconstruction outputs. In this paper, we propose EndoRD-GS, an innovative approach that overcomes these two limitations through two core techniques: 1) periodic modulated Gaussian functions and 2) a new Biplane module. Specifically, our periodic modulated Gaussian functions incorporate meticulously designed modulations, significantly enhancing the representation of complex local tissue deformations. On the other hand, our Biplane module constructs spatiotemporal interactions among Gaussian primitives, enabling global adjustments and ensuring reliable scene reconstruction during rapid perspective transformations. Extensive experiments on three datasets demonstrate that our EndoRD-GS achieves superior performance in endoscopic scene reconstruction compared to state-of-the-art methods. The code is available at EndoRD-GS.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"528-541"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TMI.2025.3599508
Kaile Chen, Weikang Zhang, Ziheng Deng, Yufu Zhou, Jun Zhao
Obtaining multiple CT scans from the same patient is required in many clinical scenarios, such as lung nodule screening and image-guided radiation therapy. Repeated scans would expose patients to higher radiation dose and increase the risk of cancer. In this study, we aim to achieve ultra-low-dose imaging for subsequent scans by collecting extremely undersampled sinogram via regional few-view scanning, and preserve image quality utilizing the preceding fullsampled scan as prior. To fully exploit prior information, we propose a two-stage framework consisting of diffusion model-based sinogram restoration and deep learning-based unrolled iterative reconstruction. Specifically, the undersampled sinogram is first restored by a conditional diffusion model with sinogram-domain prior guidance. Then, we formulate the undersampled data reconstruction problem as an optimization problem combining fidelity terms for both undersampled and restored data, along with a regularization term based on image-domain prior. Next, we propose Prior-aided Alternate Iterative NeTwork (PAINT) to solve the optimization problem. PAINT alternately updates the undersampled or restored data fidelity term, and unrolls the iterations to integrate neural network-based prior regularization. In the case of 112 mm field of view in simulated data experiments, our proposed framework achieved superior performance in terms of CT value accuracy and image details preservation. Clinical data experiments also demonstrated that our proposed framework outperformed the comparison methods in artifact reduction and structure recovery.
{"title":"PAINT: Prior-Aided Alternate Iterative NeTwork for Ultra-Low-Dose CT Imaging Using Diffusion Model-Restored Sinogram.","authors":"Kaile Chen, Weikang Zhang, Ziheng Deng, Yufu Zhou, Jun Zhao","doi":"10.1109/TMI.2025.3599508","DOIUrl":"10.1109/TMI.2025.3599508","url":null,"abstract":"<p><p>Obtaining multiple CT scans from the same patient is required in many clinical scenarios, such as lung nodule screening and image-guided radiation therapy. Repeated scans would expose patients to higher radiation dose and increase the risk of cancer. In this study, we aim to achieve ultra-low-dose imaging for subsequent scans by collecting extremely undersampled sinogram via regional few-view scanning, and preserve image quality utilizing the preceding fullsampled scan as prior. To fully exploit prior information, we propose a two-stage framework consisting of diffusion model-based sinogram restoration and deep learning-based unrolled iterative reconstruction. Specifically, the undersampled sinogram is first restored by a conditional diffusion model with sinogram-domain prior guidance. Then, we formulate the undersampled data reconstruction problem as an optimization problem combining fidelity terms for both undersampled and restored data, along with a regularization term based on image-domain prior. Next, we propose Prior-aided Alternate Iterative NeTwork (PAINT) to solve the optimization problem. PAINT alternately updates the undersampled or restored data fidelity term, and unrolls the iterations to integrate neural network-based prior regularization. In the case of 112 mm field of view in simulated data experiments, our proposed framework achieved superior performance in terms of CT value accuracy and image details preservation. Clinical data experiments also demonstrated that our proposed framework outperformed the comparison methods in artifact reduction and structure recovery.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"434-447"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TMI.2025.3607875
Walter Simson, Louise Zhuang, Benjamin N Frey, Sergio J Sanabria, Jeremy J Dahl, Dongwoon Hyun
In ultrasound imaging, propagation of an acoustic wavefront through heterogeneous media causes phase aberrations that degrade the coherence of the reflected wavefront, leading to reduced image resolution and contrast. Adaptive imaging techniques attempt to correct this phase aberration and restore coherence, leading to improved focusing of the image. We propose an autofocusing paradigm for aberration correction in ultrasound imaging by fitting an acoustic velocity field to pressure measurements, via optimization of the common midpoint phase error (CMPE), using a straight-ray wave propagation model for beamforming in diffusely scattering media. We show that CMPE induced by heterogeneous acoustic velocity is a robust measure of phase aberration that can be used for acoustic autofocusing. CMPE is optimized iteratively using a differentiable beamforming approach to simultaneously improve the image focus while estimating the acoustic velocity field of the interrogated medium. The approach relies solely on wavefield measurements using a straight-ray integral solution of the two-way time-of-flight without explicit numerical time-stepping models of wave propagation. We demonstrate method performance through in silico simulations, in vitro phantom measurements, and in vivo mammalian models, showing practical applications in distributed aberration quantification, correction, and velocity estimation for medical ultrasound autofocusing.
{"title":"Ultrasound Autofocusing: Common Midpoint Phase Error Optimization via Differentiable Beamforming.","authors":"Walter Simson, Louise Zhuang, Benjamin N Frey, Sergio J Sanabria, Jeremy J Dahl, Dongwoon Hyun","doi":"10.1109/TMI.2025.3607875","DOIUrl":"10.1109/TMI.2025.3607875","url":null,"abstract":"<p><p>In ultrasound imaging, propagation of an acoustic wavefront through heterogeneous media causes phase aberrations that degrade the coherence of the reflected wavefront, leading to reduced image resolution and contrast. Adaptive imaging techniques attempt to correct this phase aberration and restore coherence, leading to improved focusing of the image. We propose an autofocusing paradigm for aberration correction in ultrasound imaging by fitting an acoustic velocity field to pressure measurements, via optimization of the common midpoint phase error (CMPE), using a straight-ray wave propagation model for beamforming in diffusely scattering media. We show that CMPE induced by heterogeneous acoustic velocity is a robust measure of phase aberration that can be used for acoustic autofocusing. CMPE is optimized iteratively using a differentiable beamforming approach to simultaneously improve the image focus while estimating the acoustic velocity field of the interrogated medium. The approach relies solely on wavefield measurements using a straight-ray integral solution of the two-way time-of-flight without explicit numerical time-stepping models of wave propagation. We demonstrate method performance through in silico simulations, in vitro phantom measurements, and in vivo mammalian models, showing practical applications in distributed aberration quantification, correction, and velocity estimation for medical ultrasound autofocusing.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":"681-692"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1109/TMI.2026.3658004
Scott S Hsieh, James Day, Xinchen Deng, Magdalena Bazalova-Carter
Ring artifacts in CT are caused by uncalibrated variations in detector pixels and are especially prevalent with emerging photon counting detectors (PCDs). Control of ring artifacts is conventionally accomplished by improving either hardware manufacturing or software correction algorithms. An alternative solution is detector autocalibration, in which two redundant samples of each line integral are acquired and used to dynamically calibrate the PCD. Autocalibration was first proposed by Hounsfield in 1977 and was demonstrated on the EMI Topaz prototype scanner in 1980, but details surrounding this implementation are sparse. We investigate a form of autocalibration that requires just two redundant acquisitions, which could be acquired using flying focal spot on a clinical scanner but is demonstrated here with a detector shift. We formulated autocalibration as an optimization problem to determine the relative gain factor of each pixel and tested it on scans of a chicken thigh specimen, resolution phantom, and a cylindrical phantom. Ring artifacts were significantly reduced. Some residual artifacts remained but could not be discriminated from the intrinsic temporal instability of our PCD modules. Autocalibration could facilitate the adoption of widespread photon counting CT by reducing ring artifacts, thermal management requirements, or stability requirements that are present today. Demonstration of autocalibration on a rotating gantry with flying focal spot remains future work.
{"title":"Ring artifact reduction in photon counting CT using redundant sampling and autocalibration.","authors":"Scott S Hsieh, James Day, Xinchen Deng, Magdalena Bazalova-Carter","doi":"10.1109/TMI.2026.3658004","DOIUrl":"https://doi.org/10.1109/TMI.2026.3658004","url":null,"abstract":"<p><p>Ring artifacts in CT are caused by uncalibrated variations in detector pixels and are especially prevalent with emerging photon counting detectors (PCDs). Control of ring artifacts is conventionally accomplished by improving either hardware manufacturing or software correction algorithms. An alternative solution is detector autocalibration, in which two redundant samples of each line integral are acquired and used to dynamically calibrate the PCD. Autocalibration was first proposed by Hounsfield in 1977 and was demonstrated on the EMI Topaz prototype scanner in 1980, but details surrounding this implementation are sparse. We investigate a form of autocalibration that requires just two redundant acquisitions, which could be acquired using flying focal spot on a clinical scanner but is demonstrated here with a detector shift. We formulated autocalibration as an optimization problem to determine the relative gain factor of each pixel and tested it on scans of a chicken thigh specimen, resolution phantom, and a cylindrical phantom. Ring artifacts were significantly reduced. Some residual artifacts remained but could not be discriminated from the intrinsic temporal instability of our PCD modules. Autocalibration could facilitate the adoption of widespread photon counting CT by reducing ring artifacts, thermal management requirements, or stability requirements that are present today. Demonstration of autocalibration on a rotating gantry with flying focal spot remains future work.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}