Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588836
Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang
Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at https://github.com/DeepMed-Lab-ECNU/CiMP/
{"title":"Clinical Stage Prompt Induced Multi-Modal Prognosis","authors":"Ting Jin;Xingran Xie;Qingli Li;Xinxing Li;Yan Wang","doi":"10.1109/TMI.2025.3588836","DOIUrl":"10.1109/TMI.2025.3588836","url":null,"abstract":"Histology analysis of the tumor micro-environment integrated with genomic assays is widely regarded as the cornerstone for cancer analysis and survival prediction. This paper jointly incorporates genomics and Whole Slide Images (WSIs), and focuses on addressing the primary challenges involved in multi-modality prognosis analysis: 1) the high-order relevance is difficult to be modeled from dimensional imbalanced gigapixel WSIs and tens of thousands of genetic sequences, and 2) the lack of medical expertise and clinical knowledge hampers the effectiveness of prognosis-oriented multi-modal fusion. Due to the nature of the prognosis task, statistical priors and clinical knowledge are essential factors to provide the likelihood of survival over time, which, however, has been under-studied. To this end, we propose a prognosis-oriented image-omics fusion framework, dubbed Clinical Stage Prompt induced Multimodal Prognosis (CiMP). Concretely, we leverage the capabilities of the advanced LLM to generate descriptions derived from structured clinical records and utilize the generated clinical staging prompts to inquire critical prognosis-related information from each modality intentionally. In addition, we propose a Group Multi-Head Self-Attention module to capture structured group-specific features within cohorts of genomic data. Experimental results on five TCGA datasets show the superiority of our proposed method, achieving state-of-the-art performance compared to previous multi-modal prognostic models. Furthermore, the clinical interpretability and discussion also highlight the immense potential for further medical applications. Our code will be released at <uri>https://github.com/DeepMed-Lab-ECNU/CiMP/</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5065-5076"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-14DOI: 10.1109/TMI.2025.3588789
Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma
Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.
{"title":"Self-Supervised Upsampling for Reconstructions With Generalized Enhancement in Photoacoustic Computed Tomography","authors":"Kexin Deng;Yan Luo;Hongzhi Zuo;Yuwen Chen;Liujie Gu;Mingyuan Liu;Hengrong Lan;Jianwen Luo;Cheng Ma","doi":"10.1109/TMI.2025.3588789","DOIUrl":"10.1109/TMI.2025.3588789","url":null,"abstract":"Photoacoustic computed tomography (PACT) is an emerging hybrid imaging modality with potential applications in biomedicine. A major roadblock to the widespread adoption of PACT is the limited number of detectors, which gives rise to spatial aliasing and manifests as streak artifacts in the reconstructed image. A brute-force solution to the problem is to increase the number of detectors, which, however, is often undesirable due to escalated costs. In this study, we present a novel self-supervised learning approach, to overcome this long-standing challenge. We found that small blocks of PACT channel data show similarity at various downsampling rates. Based on this observation, a neural network trained on downsampled data can reliably perform accurate interpolation without requiring densely-sampled ground truth data, which is typically unavailable in real practice. Our method has undergone validation through numerical simulations, controlled phantom experiments, as well as ex vivo and in vivo animal tests, across multiple PACT systems. We have demonstrated that our technique provides an effective and cost-efficient solution to address the under-sampling issue in PACT, thereby enhancing the capabilities of this imaging technology.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5117-5127"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144629835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skin lesion segmentation is vital for the early detection, diagnosis, and treatment of melanoma, yet it remains challenging due to significant variations in lesion attributes (e.g., color, size, shape), ambiguous boundaries, and noise interference. Recent advancements have focused on capturing contextual information and incorporating boundary priors to handle challenging lesions. However, there has been limited exploration on the explicit analysis of the inherent patterns of skin lesions, a crucial aspect of the knowledge-driven decision-making process used by clinical experts. In this work, we introduce a novel approach called Probabilistic Attribute Learning (PAL), which leverages knowledge of lesion patterns to achieve enhanced performance on challenging lesions. Recognizing that the lesion patterns exhibited in each image can be properly depicted by disentangled attributes, we begin by explicitly estimating the distributions of these attributes as distinct Gaussian distributions, with mean and variance indicating the most likely pattern of that attribute and its variation. Using Monte Carlo Sampling, we iteratively draw multiple samples from these distributions to capture various potential patterns for each attribute. These samples are then merged through an effective attribute fusion technique, resulting in diverse representations that comprehensively depict the lesion class. By performing pixel-class proximity matching between each pixel-wise representation and the diverse class-wise representations, we significantly enhance the model’s robustness. Extensive experiments on two public skin lesion datasets and one unified polyp lesion dataset demonstrate the effectiveness and strong generalization ability of our method. Codes are available at https://github.com/IsYuchenYuan/PAL
{"title":"PAL: Boosting Skin Lesion Segmentation via Probabilistic Attribute Learning","authors":"Yuchen Yuan;Xi Wang;Jinpeng Li;Guangyong Chen;Pheng-Ann Heng","doi":"10.1109/TMI.2025.3588167","DOIUrl":"10.1109/TMI.2025.3588167","url":null,"abstract":"Skin lesion segmentation is vital for the early detection, diagnosis, and treatment of melanoma, yet it remains challenging due to significant variations in lesion attributes (e.g., color, size, shape), ambiguous boundaries, and noise interference. Recent advancements have focused on capturing contextual information and incorporating boundary priors to handle challenging lesions. However, there has been limited exploration on the explicit analysis of the inherent patterns of skin lesions, a crucial aspect of the knowledge-driven decision-making process used by clinical experts. In this work, we introduce a novel approach called Probabilistic Attribute Learning (PAL), which leverages knowledge of lesion patterns to achieve enhanced performance on challenging lesions. Recognizing that the lesion patterns exhibited in each image can be properly depicted by disentangled attributes, we begin by explicitly estimating the distributions of these attributes as distinct Gaussian distributions, with mean and variance indicating the most likely pattern of that attribute and its variation. Using Monte Carlo Sampling, we iteratively draw multiple samples from these distributions to capture various potential patterns for each attribute. These samples are then merged through an effective attribute fusion technique, resulting in diverse representations that comprehensively depict the lesion class. By performing pixel-class proximity matching between each pixel-wise representation and the diverse class-wise representations, we significantly enhance the model’s robustness. Extensive experiments on two public skin lesion datasets and one unified polyp lesion dataset demonstrate the effectiveness and strong generalization ability of our method. Codes are available at <uri>https://github.com/IsYuchenYuan/PAL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5183-5196"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-11DOI: 10.1109/TMI.2025.3588157
Xiaolong Deng;Huisi Wu
Accurate segmentation of the left ventricle in echocardiography is critical for diagnosing and treating cardiovascular diseases. However, accurate segmentation remains challenging due to the limitations of ultrasound imaging. Although numerous image and video segmentation methods have been proposed, existing methods still fail to effectively solve this task, which is limited by sparsity annotations. To address this problem, we propose a novel semi-supervised segmentation framework named NCM-Net for echocardiography. We first propose the neighborhood correlation mining (NCM) module, which sufficiently mines the correlations between query features and their spatiotemporal neighborhoods to resist noise influence. The module also captures cross-scale contextual correlations between pixels spatially to further refine features, thus alleviating the impact of noise on echocardiography segmentation. To further improve segmentation accuracy, we propose using unreliable-pixels masked attention (UMA). By masking reliable pixels, it pays extra attention to unreliable pixels to refine the boundary of segmentation. Further, we use cross-frame boundary constraints on the final predictions to optimize their temporal consistency. Through extensive experiments on two publicly available datasets, CAMUS and EchoNet-Dynamic, we demonstrate the effectiveness of the proposed, which achieves state-of-the-art performance and outstanding temporal consistency. Codes are available at https://github.com/dengxl0520/NCMNet
{"title":"Echocardiography Video Segmentation via Neighborhood Correlation Mining","authors":"Xiaolong Deng;Huisi Wu","doi":"10.1109/TMI.2025.3588157","DOIUrl":"10.1109/TMI.2025.3588157","url":null,"abstract":"Accurate segmentation of the left ventricle in echocardiography is critical for diagnosing and treating cardiovascular diseases. However, accurate segmentation remains challenging due to the limitations of ultrasound imaging. Although numerous image and video segmentation methods have been proposed, existing methods still fail to effectively solve this task, which is limited by sparsity annotations. To address this problem, we propose a novel semi-supervised segmentation framework named NCM-Net for echocardiography. We first propose the neighborhood correlation mining (NCM) module, which sufficiently mines the correlations between query features and their spatiotemporal neighborhoods to resist noise influence. The module also captures cross-scale contextual correlations between pixels spatially to further refine features, thus alleviating the impact of noise on echocardiography segmentation. To further improve segmentation accuracy, we propose using unreliable-pixels masked attention (UMA). By masking reliable pixels, it pays extra attention to unreliable pixels to refine the boundary of segmentation. Further, we use cross-frame boundary constraints on the final predictions to optimize their temporal consistency. Through extensive experiments on two publicly available datasets, CAMUS and EchoNet-Dynamic, we demonstrate the effectiveness of the proposed, which achieves state-of-the-art performance and outstanding temporal consistency. Codes are available at <uri>https://github.com/dengxl0520/NCMNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5172-5182"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144611122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1109/TMI.2025.3587636
Jihun Kang;Eui Cheol Jung;Hyun Jung Koo;Dong Hyun Yang;Hojin Ha
In this study, we present enhanced physics-informed neural networks (PINNs), which were designed to address flow field errors in four-dimensional flow magnetic resonance imaging (4D Flow MRI). Flow field errors, typically occurring in high-velocity regions, lead to inaccuracies in velocity fields and flow rate underestimation. We proposed incorporating flow rate constraints to ensure physical consistency across cross-sections. The proposed framework included optimization strategies to improve convergence, stability, and accuracy. Artificial viscosity modeling, projecting conflicting gradients (PCGrad), and Euclidean norm scaling were applied to balance loss functions during training. The performance was validated using 2D computational fluid dynamics (CFD) with synthetic error, in-vitro 4D flow MRI mimicking aortic valve, and in-vivo 4D flow MRI from patients with aortic regurgitation and aortic stenosis. This study demonstrated considerable improvements in correcting flow field errors, denoising, and super-resolution. Notably, the proposed PINNs provided accurate flow rate reconstruction in stenotic and high-velocity regions. This approach extends the applicability of 4D flow MRI by providing reliable hemodynamics in the post-processing stage.
{"title":"Flow-Rate-Constrained Physics-Informed Neural Networks for Flow Field Error Correction in 4-D Flow Magnetic Resonance Imaging","authors":"Jihun Kang;Eui Cheol Jung;Hyun Jung Koo;Dong Hyun Yang;Hojin Ha","doi":"10.1109/TMI.2025.3587636","DOIUrl":"10.1109/TMI.2025.3587636","url":null,"abstract":"In this study, we present enhanced physics-informed neural networks (PINNs), which were designed to address flow field errors in four-dimensional flow magnetic resonance imaging (4D Flow MRI). Flow field errors, typically occurring in high-velocity regions, lead to inaccuracies in velocity fields and flow rate underestimation. We proposed incorporating flow rate constraints to ensure physical consistency across cross-sections. The proposed framework included optimization strategies to improve convergence, stability, and accuracy. Artificial viscosity modeling, projecting conflicting gradients (PCGrad), and Euclidean norm scaling were applied to balance loss functions during training. The performance was validated using 2D computational fluid dynamics (CFD) with synthetic error, in-vitro 4D flow MRI mimicking aortic valve, and in-vivo 4D flow MRI from patients with aortic regurgitation and aortic stenosis. This study demonstrated considerable improvements in correcting flow field errors, denoising, and super-resolution. Notably, the proposed PINNs provided accurate flow rate reconstruction in stenotic and high-velocity regions. This approach extends the applicability of 4D flow MRI by providing reliable hemodynamics in the post-processing stage.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5155-5171"},"PeriodicalIF":0.0,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144603140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metal dental implants may introduce metal artifacts (MA) during the CBCT imaging process, causing significant interference in subsequent diagnosis. In recent years, many deep learning methods for metal artifact reduction (MAR) have been proposed. Due to the huge difference between synthetic and clinical MA, supervised learning MAR methods may perform poorly in clinical settings. Many existing unsupervised MAR methods trained on clinical data often suffer from incorrect dental morphology. To alleviate the above problems, in this paper, we propose a new MAR method of Coupled Diffusion Models (CDM) for clinical dental CBCT images. Specifically, we separately train two diffusion models on clinical MA-degraded images and clinical clean images to obtain prior information, respectively. During the denoising process, the variances of noise levels are calculated from MA images and the prior of diffusion models. Then we develop a noise transformation module between the two diffusion models to transform the MA noise image into a new initial value for the denoising process. Our designs effectively exploit the inherent transformation between the misaligned MA-degraded images and clean images. Additionally, we introduce an MA-adaptive inference technique to better accommodate the MA degradation in different areas of an MA-degraded image. Experiments on our clinical dataset demonstrate that our CDM outperforms the comparison methods on both objective metrics and visual quality, especially for severe MA degradation. We will publicly release our code.
{"title":"Coupled Diffusion Models for Metal Artifact Reduction of Clinical Dental CBCT Images","authors":"Zhouzhuo Zhang;Juncheng Yan;Yuxuan Shi;Zhiming Cui;Jun Xu;Dinggang Shen","doi":"10.1109/TMI.2025.3587131","DOIUrl":"10.1109/TMI.2025.3587131","url":null,"abstract":"Metal dental implants may introduce metal artifacts (MA) during the CBCT imaging process, causing significant interference in subsequent diagnosis. In recent years, many deep learning methods for metal artifact reduction (MAR) have been proposed. Due to the huge difference between synthetic and clinical MA, supervised learning MAR methods may perform poorly in clinical settings. Many existing unsupervised MAR methods trained on clinical data often suffer from incorrect dental morphology. To alleviate the above problems, in this paper, we propose a new MAR method of Coupled Diffusion Models (CDM) for clinical dental CBCT images. Specifically, we separately train two diffusion models on clinical MA-degraded images and clinical clean images to obtain prior information, respectively. During the denoising process, the variances of noise levels are calculated from MA images and the prior of diffusion models. Then we develop a noise transformation module between the two diffusion models to transform the MA noise image into a new initial value for the denoising process. Our designs effectively exploit the inherent transformation between the misaligned MA-degraded images and clean images. Additionally, we introduce an MA-adaptive inference technique to better accommodate the MA degradation in different areas of an MA-degraded image. Experiments on our clinical dataset demonstrate that our CDM outperforms the comparison methods on both objective metrics and visual quality, especially for severe MA degradation. We will publicly release our code.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5103-5116"},"PeriodicalIF":0.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cone-beam CT is extensively used in medical diagnosis and treatment. Despite its large longitudinal field of view (FoV), the horizontal FoV of CBCT systems is severely limited due to the detector width. Certain commercial CBCT systems increase the horizontal FoV by employing the offset detector method. However, this method necessitates 360° full circular scanning trajectory which increases the scanning time and is not compatible with specific CBCT system models. In this paper, we investigate the feasibility of large FoV imaging under short scan trajectories with an additional X-ray source. A dual-source CBCT geometry is proposed as well as two corresponding image reconstruction algorithms. The first one is based on cone-parallel rebinning and the subsequent employs a modified Parker weighting scheme. Theoretical calculations demonstrate that the proposed geometry achieves a wider horizontal FoV than the ${90}%$ detector offset geometry (radius of ${214}.{83}textit {mm}$ vs. ${198}.{99}textit {mm}$ ) with a significantly reduced rotation angle (less than 230° vs. 360°). As demonstrated by experiments, the proposed geometry and reconstruction algorithms obtain comparable imaging qualities within the FoV to conventional CBCT imaging techniques. Implementing the proposed geometry is straightforward and does not substantially increase development expenses. It possesses the capacity to expand CBCT applications even further.
锥束CT在医学诊断和治疗中有着广泛的应用。尽管CBCT系统具有较大的纵向视场(FoV),但由于探测器宽度的限制,CBCT系统的水平视场受到严重限制。某些商用CBCT系统通过采用偏移检测法来增加水平视场。然而,该方法需要360°全圆周扫描轨迹,增加了扫描时间,并且与特定的CBCT系统模型不兼容。在本文中,我们研究了在短扫描轨迹下使用附加x射线源进行大视场成像的可行性。提出了一种双源CBCT几何结构以及两种相应的图像重建算法。第一种方法是基于锥平行重球,第二种方法采用改进的帕克加权方法。理论计算表明,所提出的几何结构比${90}%$探测器偏移几何(半径${214})实现了更宽的水平视场。{83}textit {mm}$ vs. ${198}。{99}textit {mm}$),旋转角度明显减小(小于230°vs 360°)。实验证明,所提出的几何和重建算法在视场内获得了与传统CBCT成像技术相当的成像质量。实现所建议的几何结构非常简单,并且不会大幅增加开发费用。它具有进一步扩大CBCT应用的能力。
{"title":"Dual-Source CBCT for Large FoV Imaging Under Short-Scan Trajectories","authors":"Tianling Lyu;Xusheng Zhang;Xinyun Zhong;Zhan Wu;Yan Xi;Wei Zhao;Yang Chen;Yuanjing Feng;Wentao Zhu","doi":"10.1109/TMI.2025.3586622","DOIUrl":"10.1109/TMI.2025.3586622","url":null,"abstract":"Cone-beam CT is extensively used in medical diagnosis and treatment. Despite its large longitudinal field of view (FoV), the horizontal FoV of CBCT systems is severely limited due to the detector width. Certain commercial CBCT systems increase the horizontal FoV by employing the offset detector method. However, this method necessitates 360° full circular scanning trajectory which increases the scanning time and is not compatible with specific CBCT system models. In this paper, we investigate the feasibility of large FoV imaging under short scan trajectories with an additional X-ray source. A dual-source CBCT geometry is proposed as well as two corresponding image reconstruction algorithms. The first one is based on cone-parallel rebinning and the subsequent employs a modified Parker weighting scheme. Theoretical calculations demonstrate that the proposed geometry achieves a wider horizontal FoV than the <inline-formula> <tex-math>${90}%$ </tex-math></inline-formula> detector offset geometry (radius of <inline-formula> <tex-math>${214}.{83}textit {mm}$ </tex-math></inline-formula> vs. <inline-formula> <tex-math>${198}.{99}textit {mm}$ </tex-math></inline-formula>) with a significantly reduced rotation angle (less than 230° vs. 360°). As demonstrated by experiments, the proposed geometry and reconstruction algorithms obtain comparable imaging qualities within the FoV to conventional CBCT imaging techniques. Implementing the proposed geometry is straightforward and does not substantially increase development expenses. It possesses the capacity to expand CBCT applications even further.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5051-5064"},"PeriodicalIF":0.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-25DOI: 10.1109/TMI.2025.3579214
Jieru Yao;Guangyu Guo;Zhaohui Zheng;Qiang Xie;Longfei Han;Dingwen Zhang;Junwei Han
Nuclei instance segmentation and classification are a fundamental and challenging task in whole slide Imaging (WSI) analysis. Most dense nuclei prediction studies rely heavily on crowd labelled data on high-resolution digital images, leading to a time-consuming and expertise-required paradigm. Recently, Vision-Language Models (VLMs) have been intensively investigated, which learn rich cross-modal correlation from large-scale image-text pairs without tedious annotations. Inspired by this, we build a novel framework, called PromptNu, aiming at infusing abundant nuclei knowledge into the training of the nuclei instance recognition model through vision-language contrastive learning and prompt engineering techniques. Specifically, our approach starts with the creation of multifaceted prompts that integrate comprehensive nuclear knowledge, including visual insights from the GPT-4V model, statistical analyses, and expert insights from the pathology field. Then, we propose a novel prompting methodology that consists of two pivotal vision-language contrastive learning components: the Prompting Nuclei Representation Learning (PNuRL) and the Prompting Nuclei Dense Prediction (PNuDP), which adeptly integrates the expertise embedded in pre-trained VLMs and multifaceted prompts into the feature extraction and prediction process, respectively. Comprehensive experiments on six datasets with extensive WSI scenarios demonstrate the effectiveness of our method for both nuclei instance segmentation and classification tasks. The code is available at https://github.com/NucleiDet/PromptNu
{"title":"Prompting Vision-Language Model for Nuclei Instance Segmentation and Classification","authors":"Jieru Yao;Guangyu Guo;Zhaohui Zheng;Qiang Xie;Longfei Han;Dingwen Zhang;Junwei Han","doi":"10.1109/TMI.2025.3579214","DOIUrl":"10.1109/TMI.2025.3579214","url":null,"abstract":"Nuclei instance segmentation and classification are a fundamental and challenging task in whole slide Imaging (WSI) analysis. Most dense nuclei prediction studies rely heavily on crowd labelled data on high-resolution digital images, leading to a time-consuming and expertise-required paradigm. Recently, Vision-Language Models (VLMs) have been intensively investigated, which learn rich cross-modal correlation from large-scale image-text pairs without tedious annotations. Inspired by this, we build a novel framework, called PromptNu, aiming at infusing abundant nuclei knowledge into the training of the nuclei instance recognition model through vision-language contrastive learning and prompt engineering techniques. Specifically, our approach starts with the creation of multifaceted prompts that integrate comprehensive nuclear knowledge, including visual insights from the GPT-4V model, statistical analyses, and expert insights from the pathology field. Then, we propose a novel prompting methodology that consists of two pivotal vision-language contrastive learning components: the Prompting Nuclei Representation Learning (PNuRL) and the Prompting Nuclei Dense Prediction (PNuDP), which adeptly integrates the expertise embedded in pre-trained VLMs and multifaceted prompts into the feature extraction and prediction process, respectively. Comprehensive experiments on six datasets with extensive WSI scenarios demonstrate the effectiveness of our method for both nuclei instance segmentation and classification tasks. The code is available at <uri>https://github.com/NucleiDet/PromptNu</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4567-4578"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-25DOI: 10.1109/TMI.2025.3580659
Zailong Chen;Yingshu Li;Zhanyu Wang;Peng Gao;Johan Barthelemy;Luping Zhou;Lei Wang
Radiology report generation using large language models has recently produced reports with more realistic styles and better language fluency. However, their clinical accuracy remains inadequate. Considering the significant imbalance between clinical phrases and general descriptions in a report, we argue that using an entire report for supervision is problematic as it fails to emphasize the crucial clinical phrases, which require focused learning. To address this issue, we propose a multi-phased supervision method, inspired by the spirit of curriculum learning where models are trained by gradually increasing task complexity. Our approach organizes the learning process into structured phases at different levels of semantical granularity, each building on the previous one to enhance the model. During the first phase, disease labels are used to supervise the model, equipping it with the ability to identify underlying diseases. The second phase progresses to use entity-relation triples to guide the model to describe associated clinical findings. Finally, in the third phase, we introduce conventional whole-report-based supervision to quickly adapt the model for report generation. Throughout the phased training, the model remains the same and consistently operates in the generation mode. As experimentally demonstrated, this proposed change in the way of supervision enhances report generation, achieving state-of-the-art performance in both language fluency and clinical accuracy. Our work underscores the importance of training process design in radiology report generation. Our code is available on https://github.com/zailongchen/MultiP-R2Gen
{"title":"Enhancing Radiology Report Generation via Multi-Phased Supervision","authors":"Zailong Chen;Yingshu Li;Zhanyu Wang;Peng Gao;Johan Barthelemy;Luping Zhou;Lei Wang","doi":"10.1109/TMI.2025.3580659","DOIUrl":"10.1109/TMI.2025.3580659","url":null,"abstract":"Radiology report generation using large language models has recently produced reports with more realistic styles and better language fluency. However, their clinical accuracy remains inadequate. Considering the significant imbalance between clinical phrases and general descriptions in a report, we argue that using an entire report for supervision is problematic as it fails to emphasize the crucial clinical phrases, which require focused learning. To address this issue, we propose a multi-phased supervision method, inspired by the spirit of curriculum learning where models are trained by gradually increasing task complexity. Our approach organizes the learning process into structured phases at different levels of semantical granularity, each building on the previous one to enhance the model. During the first phase, disease labels are used to supervise the model, equipping it with the ability to identify underlying diseases. The second phase progresses to use entity-relation triples to guide the model to describe associated clinical findings. Finally, in the third phase, we introduce conventional whole-report-based supervision to quickly adapt the model for report generation. Throughout the phased training, the model remains the same and consistently operates in the generation mode. As experimentally demonstrated, this proposed change in the way of supervision enhances report generation, achieving state-of-the-art performance in both language fluency and clinical accuracy. Our work underscores the importance of training process design in radiology report generation. Our code is available on <uri>https://github.com/zailongchen/MultiP-R2Gen</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4666-4677"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-20DOI: 10.1109/TMI.2025.3581108
Yuyang Du;Kexin Chen;Yue Zhan;Chang Han Low;Mobarakol Islam;Ziyu Guo;Yueming Jin;Guangyong Chen;Pheng Ann Heng
Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.
{"title":"LMT++: Adaptively Collaborating LLMs With Multi-Specialized Teachers for Continual VQA in Robotic Surgical Videos","authors":"Yuyang Du;Kexin Chen;Yue Zhan;Chang Han Low;Mobarakol Islam;Ziyu Guo;Yueming Jin;Guangyong Chen;Pheng Ann Heng","doi":"10.1109/TMI.2025.3581108","DOIUrl":"10.1109/TMI.2025.3581108","url":null,"abstract":"Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4678-4689"},"PeriodicalIF":0.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144335331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}