Magnetic resonance imaging (MRI) is powerful in medical diagnostics, yet high-field MRI, despite offering superior image quality, incurs significant costs for procurement, installation, maintenance, and operation, restricting its availability and accessibility, especially in low- and middle-income countries. Addressing this, our study proposes an unsupervised learning algorithm based on cycle-consistent generative adversarial networks. This framework transforms 0.3T low-field MRI into higher-quality 3T-like images, bypassing the need for paired low/high-field training data. The proposed architecture integrates two novel modules to enhance reconstruction quality: (1) an attention block that dynamically balances high-field-like features with the original low-field input, and (2) an edge block that refines boundary details, providing more accurate structural reconstruction. The proposed generative model is trained on large-scale, unpaired, public datasets, and further validated on paired low/high-field acquisitions of three major clinical MRI sequences: T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) imaging. It demonstrates notable improvements in tissue contrast and signal-to-noise ratio while preserving anatomical fidelity. This approach utilizes rich information from publicly available MRI resources, providing a data-efficient unsupervised alternative that complements supervised methods to enhance the utility of low-field MRI.
{"title":"An Unsupervised Learning Approach for Reconstructing 3T-Like Images From 0.3T MRI Without Paired Training Data","authors":"Huaishui Yang;Shaojun Liu;Yilong Liu;Lingyan Zhang;Shoujin Huang;Jiayu Zheng;Jingzhe Liu;Hua Guo;Ed X. Wu;Mengye Lyu","doi":"10.1109/TMI.2025.3597401","DOIUrl":"10.1109/TMI.2025.3597401","url":null,"abstract":"Magnetic resonance imaging (MRI) is powerful in medical diagnostics, yet high-field MRI, despite offering superior image quality, incurs significant costs for procurement, installation, maintenance, and operation, restricting its availability and accessibility, especially in low- and middle-income countries. Addressing this, our study proposes an unsupervised learning algorithm based on cycle-consistent generative adversarial networks. This framework transforms 0.3T low-field MRI into higher-quality 3T-like images, bypassing the need for paired low/high-field training data. The proposed architecture integrates two novel modules to enhance reconstruction quality: (1) an attention block that dynamically balances high-field-like features with the original low-field input, and (2) an edge block that refines boundary details, providing more accurate structural reconstruction. The proposed generative model is trained on large-scale, unpaired, public datasets, and further validated on paired low/high-field acquisitions of three major clinical MRI sequences: T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) imaging. It demonstrates notable improvements in tissue contrast and signal-to-noise ratio while preserving anatomical fidelity. This approach utilizes rich information from publicly available MRI resources, providing a data-efficient unsupervised alternative that complements supervised methods to enhance the utility of low-field MRI.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5358-5371"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144819720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised anomaly detection (UAD) methods typically detect anomalies by learning and reconstructing the normative distribution. However, since anomalies constantly invade and affect their surroundings, sub-healthy areas in the junction present structural deformations that could be easily misidentified as anomalies, posing difficulties for UAD methods that solely learn the normative distribution. The use of multimodal images can facilitate to address the above challenges, as they can provide complementary information of anomalies. Therefore, this paper propose a novel method for UAD in preoperative multimodal images, called Erasure Perception Diffusion model (EPDiff). First, the Local Erasure Progressive Training (LEPT) framework is designed to better rebuild sub-healthy structures around anomalies through the diffusion model with a two-phase process. Initially, healthy images are used to capture deviation features labeled as potential anomalies. Then, these anomalies are locally erased in multimodal images to progressively learn sub-healthy structures, obtaining a more detailed reconstruction around anomalies. Second, the Global Structural Perception (GSP) module is developed in the diffusion model to realize global structural representation and correlation within images and between modalities through interactions of high-level semantic information. In addition, a training-free module, named Multimodal Attention Fusion (MAF) module, is presented for weighted fusion of anomaly maps between different modalities and obtaining binary anomaly outputs. Experimental results show that EPDiff improves the AUPRC and mDice scores by 2% and 3.9% on BraTS2021, and by 5.2% and 4.5% on Shifts over the state-of-the-art methods, which proves the applicability of EPDiff in diverse anomaly diagnosis. The code is available at https://github.com/wjiazheng/EPDiff
{"title":"EPDiff: Erasure Perception Diffusion Model for Unsupervised Anomaly Detection in Preoperative Multimodal Images","authors":"Jiazheng Wang;Min Liu;Wenting Shen;Renjie Ding;Yaonan Wang;Erik Meijering","doi":"10.1109/TMI.2025.3597545","DOIUrl":"10.1109/TMI.2025.3597545","url":null,"abstract":"Unsupervised anomaly detection (UAD) methods typically detect anomalies by learning and reconstructing the normative distribution. However, since anomalies constantly invade and affect their surroundings, sub-healthy areas in the junction present structural deformations that could be easily misidentified as anomalies, posing difficulties for UAD methods that solely learn the normative distribution. The use of multimodal images can facilitate to address the above challenges, as they can provide complementary information of anomalies. Therefore, this paper propose a novel method for UAD in preoperative multimodal images, called Erasure Perception Diffusion model (EPDiff). First, the Local Erasure Progressive Training (LEPT) framework is designed to better rebuild sub-healthy structures around anomalies through the diffusion model with a two-phase process. Initially, healthy images are used to capture deviation features labeled as potential anomalies. Then, these anomalies are locally erased in multimodal images to progressively learn sub-healthy structures, obtaining a more detailed reconstruction around anomalies. Second, the Global Structural Perception (GSP) module is developed in the diffusion model to realize global structural representation and correlation within images and between modalities through interactions of high-level semantic information. In addition, a training-free module, named Multimodal Attention Fusion (MAF) module, is presented for weighted fusion of anomaly maps between different modalities and obtaining binary anomaly outputs. Experimental results show that EPDiff improves the AUPRC and mDice scores by 2% and 3.9% on BraTS2021, and by 5.2% and 4.5% on Shifts over the state-of-the-art methods, which proves the applicability of EPDiff in diverse anomaly diagnosis. The code is available at <uri>https://github.com/wjiazheng/EPDiff</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"379-390"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144819772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-08DOI: 10.1109/TMI.2025.3597026
Xiaoyu Zhu;Shiyin Li;HongLiang Bi;Lina Guan;Haiyang Liu;Zhaolin Lu
Choroidal thickness variations serve as critical biomarkers for numerous ophthalmic diseases. Accurate segmentation and quantification of the choroid in optical coherence tomography (OCT) images is essential for clinical diagnosis and disease progression monitoring. Due to the small number of disease types in the public OCT dataset involving changes in choroidal thickness and the lack of a publicly available labeled dataset, we constructed the Xuzhou Municipal Hospital (XZMH)-Choroid dataset. This dataset contains annotated OCT images of normal and eight choroid-related diseases. However, segmentation of the choroid in OCT images remains a formidable challenge due to the confounding factors of blurred boundaries, non-uniform texture, and lesions. To overcome these challenges, we proposed a mixed attention-guided multiscale feature fusion network (MAMFF-Net). This network integrates a Mixed Attention Encoder (MAE) for enhanced fine-grained feature extraction, a deformable multiscale feature fusion path (DMFFP) for adaptive feature integration across lesion deformations, and a multiscale pyramid layer aggregation (MPLA) module for improved contextual representation learning. Through comparative experiments with other deep learning methods, we found that the MAMFF-Net model has better segmentation performance than other deep learning methods (mDice: 97.44, mIoU: 95.11, mAcc: 97.71). Based on the choroidal segmentation implemented in MAMFF-Net, an algorithm for automated choroidal thickness measurement was developed, and the automated measurement results approached the level of senior specialists.
{"title":"Automatic Choroid Segmentation and Thickness Measurement Based on Mixed Attention-Guided Multiscale Feature Fusion Network","authors":"Xiaoyu Zhu;Shiyin Li;HongLiang Bi;Lina Guan;Haiyang Liu;Zhaolin Lu","doi":"10.1109/TMI.2025.3597026","DOIUrl":"10.1109/TMI.2025.3597026","url":null,"abstract":"Choroidal thickness variations serve as critical biomarkers for numerous ophthalmic diseases. Accurate segmentation and quantification of the choroid in optical coherence tomography (OCT) images is essential for clinical diagnosis and disease progression monitoring. Due to the small number of disease types in the public OCT dataset involving changes in choroidal thickness and the lack of a publicly available labeled dataset, we constructed the Xuzhou Municipal Hospital (XZMH)-Choroid dataset. This dataset contains annotated OCT images of normal and eight choroid-related diseases. However, segmentation of the choroid in OCT images remains a formidable challenge due to the confounding factors of blurred boundaries, non-uniform texture, and lesions. To overcome these challenges, we proposed a mixed attention-guided multiscale feature fusion network (MAMFF-Net). This network integrates a Mixed Attention Encoder (MAE) for enhanced fine-grained feature extraction, a deformable multiscale feature fusion path (DMFFP) for adaptive feature integration across lesion deformations, and a multiscale pyramid layer aggregation (MPLA) module for improved contextual representation learning. Through comparative experiments with other deep learning methods, we found that the MAMFF-Net model has better segmentation performance than other deep learning methods (mDice: 97.44, mIoU: 95.11, mAcc: 97.71). Based on the choroidal segmentation implemented in MAMFF-Net, an algorithm for automated choroidal thickness measurement was developed, and the automated measurement results approached the level of senior specialists.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"350-363"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised brain lesion segmentation, focusing on learning normative distributions from images of healthy subjects, are less dependent on lesion-labeled data, thus exhibiting better generalization capabilities. A fundamental challenge in learning normative distributions of images lies in the high dimensionality if image pixels are treated as correlated random variables to capture spatial dependence. In this study, we proposed a subspace-based deep generative model to learn the posterior normal distributions. Specifically, we used probabilistic subspace models to capture spatial-intensity distributions and spatial-structure distributions of brain images from healthy subjects. These models captured prior spatial-intensity and spatial-structure variations effectively by treating the subspace coefficients as random variables with basis functions being the eigen-images and eigen-density functions learned from the training data. These prior distributions were then converted to posterior distributions, including both the posterior normal and posterior lesion distributions for a given image using the subspace-based generative model and subspace-assisted Bayesian analysis, respectively. Finally, an unsupervised fusion classifier was used to combine the posterior and likelihood features for lesion segmentation. The proposed method has been evaluated on simulated and real lesion data, including tumor, multiple sclerosis, and stroke, demonstrating superior segmentation accuracy and robustness over the state-of-the-art methods. Our proposed method holds promise for enhancing unsupervised brain lesion delineation in clinical applications.
{"title":"Unsupervised Brain Lesion Segmentation Using Posterior Distributions Learned by Subspace-Based Generative Model","authors":"Huixiang Zhuang;Yue Guan;Yi Ding;Chang Xu;Zijun Cheng;Yuhao Ma;Ruihao Liu;Ziyu Meng;Li Cao;Yao Li;Zhi-Pei Liang","doi":"10.1109/TMI.2025.3597080","DOIUrl":"10.1109/TMI.2025.3597080","url":null,"abstract":"Unsupervised brain lesion segmentation, focusing on learning normative distributions from images of healthy subjects, are less dependent on lesion-labeled data, thus exhibiting better generalization capabilities. A fundamental challenge in learning normative distributions of images lies in the high dimensionality if image pixels are treated as correlated random variables to capture spatial dependence. In this study, we proposed a subspace-based deep generative model to learn the posterior normal distributions. Specifically, we used probabilistic subspace models to capture spatial-intensity distributions and spatial-structure distributions of brain images from healthy subjects. These models captured prior spatial-intensity and spatial-structure variations effectively by treating the subspace coefficients as random variables with basis functions being the eigen-images and eigen-density functions learned from the training data. These prior distributions were then converted to posterior distributions, including both the posterior normal and posterior lesion distributions for a given image using the subspace-based generative model and subspace-assisted Bayesian analysis, respectively. Finally, an unsupervised fusion classifier was used to combine the posterior and likelihood features for lesion segmentation. The proposed method has been evaluated on simulated and real lesion data, including tumor, multiple sclerosis, and stroke, demonstrating superior segmentation accuracy and robustness over the state-of-the-art methods. Our proposed method holds promise for enhancing unsupervised brain lesion delineation in clinical applications.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"364-378"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at https://github.com/khuhm/ACVTT
{"title":"An Anisotropic Cross-View Texture Transfer With Multi-Reference Non-Local Attention for CT Slice Interpolation","authors":"Kwang-Hyun Uhm;Hyunjun Cho;Sung-Hoo Hong;Seung-Won Jung","doi":"10.1109/TMI.2025.3596957","DOIUrl":"10.1109/TMI.2025.3596957","url":null,"abstract":"Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at <uri>https://github.com/khuhm/ACVTT</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"336-349"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-07DOI: 10.1109/TMI.2025.3596874
Yeganeh Madadi;Hina Raja;Koenraad A. Vermeer;Hans G. Lemij;Xiaoqin Huang;Eunjin Kim;Seunghoon Lee;Gitaek Kwon;Hyunwoo Kim;Jaeyoung Kim;Adrian Galdran;Miguel A. González Ballester;Dan Presil;Kristhian Aguilar;Victor Cavalcante;Celso Carvalho;Waldir Sabino;Mateus Oliveira;Hui Lin;Charilaos Apostolidis;Aggelos K. Katsaggelos;Tomasz Kubrak;Á. Casado-García;J. Heras;M. Ortega;L. Ramos;Philippe Zhang;Yihao Li;Jing Zhang;Weili Jiang;Pierre-Henri Conze;Mathieu Lamard;Gwenole Quellec;Mostafa El Habib Daho;Madukuri Shaurya;Anumeha Varma;Monika Agrawal;Siamak Yousefi
A major contributor to permanent vision loss is glaucoma. Early diagnosis is crucial for preventing vision loss due to glaucoma, making glaucoma screening essential. A more affordable method of glaucoma screening can be achieved by applying artificial intelligence to evaluate color fundus photographs (CFPs). We present the Justified Referral in AI Glaucoma Screening (JustRAIGS) challenge to further develop these AI algorithms for glaucoma screening and to assess their efficacy. To support this challenge, we have generated a distinctive big dataset containing more than 110,000 meticulously labeled CFPs obtained from approximately 60,000 patients and 500 distinct screening centers in the USA. Our objective is to assess the practicality of creating advanced and dependable AI systems that can take a CFP as input and produce the probability of referable glaucoma, as well as outputs for glaucoma justification by integrating both binary and multi-label classification tasks. This paper presents the evaluation of solutions provided by nine teams, recognizing the team with the highest level of performance. The highest achieved score of sensitivity at a specificity level of 95% was 85%, and the highest achieved score of Hamming losses average was 0.13. Additionally, we test the top three participants’ algorithms on an external dataset to validate the performance and generalization of these models. The outcomes of this research can offer valuable insights into the development of intelligent systems for detecting glaucoma. Ultimately, findings can aid in the early detection and treatment of glaucoma patients, hence decreasing preventable vision impairment and blindness caused by glaucoma.
{"title":"JustRAIGS: Justified Referral in AI Glaucoma Screening Challenge","authors":"Yeganeh Madadi;Hina Raja;Koenraad A. Vermeer;Hans G. Lemij;Xiaoqin Huang;Eunjin Kim;Seunghoon Lee;Gitaek Kwon;Hyunwoo Kim;Jaeyoung Kim;Adrian Galdran;Miguel A. González Ballester;Dan Presil;Kristhian Aguilar;Victor Cavalcante;Celso Carvalho;Waldir Sabino;Mateus Oliveira;Hui Lin;Charilaos Apostolidis;Aggelos K. Katsaggelos;Tomasz Kubrak;Á. Casado-García;J. Heras;M. Ortega;L. Ramos;Philippe Zhang;Yihao Li;Jing Zhang;Weili Jiang;Pierre-Henri Conze;Mathieu Lamard;Gwenole Quellec;Mostafa El Habib Daho;Madukuri Shaurya;Anumeha Varma;Monika Agrawal;Siamak Yousefi","doi":"10.1109/TMI.2025.3596874","DOIUrl":"10.1109/TMI.2025.3596874","url":null,"abstract":"A major contributor to permanent vision loss is glaucoma. Early diagnosis is crucial for preventing vision loss due to glaucoma, making glaucoma screening essential. A more affordable method of glaucoma screening can be achieved by applying artificial intelligence to evaluate color fundus photographs (CFPs). We present the Justified Referral in AI Glaucoma Screening (JustRAIGS) challenge to further develop these AI algorithms for glaucoma screening and to assess their efficacy. To support this challenge, we have generated a distinctive big dataset containing more than 110,000 meticulously labeled CFPs obtained from approximately 60,000 patients and 500 distinct screening centers in the USA. Our objective is to assess the practicality of creating advanced and dependable AI systems that can take a CFP as input and produce the probability of referable glaucoma, as well as outputs for glaucoma justification by integrating both binary and multi-label classification tasks. This paper presents the evaluation of solutions provided by nine teams, recognizing the team with the highest level of performance. The highest achieved score of sensitivity at a specificity level of 95% was 85%, and the highest achieved score of Hamming losses average was 0.13. Additionally, we test the top three participants’ algorithms on an external dataset to validate the performance and generalization of these models. The outcomes of this research can offer valuable insights into the development of intelligent systems for detecting glaucoma. Ultimately, findings can aid in the early detection and treatment of glaucoma patients, hence decreasing preventable vision impairment and blindness caused by glaucoma.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"320-335"},"PeriodicalIF":0.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11119643","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Classification of pathological images is the basis for automatic cancer diagnosis. Despite that deep learning methods have achieved remarkable performance, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo-labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain gap between the pre-training and target datasets. To address this issue, we introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo-labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo-labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo-labels. We further propose High-confidence Cross Supervision by to learn from samples with reliable pseudo-labels and the remaining unlabeled samples. Additionally, we present an innovative open-set prompting strategy that filters irrelevant patches from whole slides to enhance the quality of selected patches. Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs, and was superior to existing noisy label learning methods. The code is publicly available at https://github.com/HiLab-git/VLM-CPL
{"title":"VLM-CPL: Consensus Pseudo-Labels From Vision-Language Models for Annotation-Free Pathological Image Classification","authors":"Lanfeng Zhong;Zongyao Huang;Yang Liu;Wenjun Liao;Shichuan Zhang;Guotai Wang;Shaoting Zhang","doi":"10.1109/TMI.2025.3595111","DOIUrl":"10.1109/TMI.2025.3595111","url":null,"abstract":"Classification of pathological images is the basis for automatic cancer diagnosis. Despite that deep learning methods have achieved remarkable performance, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo-labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain gap between the pre-training and target datasets. To address this issue, we introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo-labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo-labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo-labels. We further propose High-confidence Cross Supervision by to learn from samples with reliable pseudo-labels and the remaining unlabeled samples. Additionally, we present an innovative open-set prompting strategy that filters irrelevant patches from whole slides to enhance the quality of selected patches. Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs, and was superior to existing noisy label learning methods. The code is publicly available at <uri>https://github.com/HiLab-git/VLM-CPL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 10","pages":"4023-4036"},"PeriodicalIF":0.0,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144778204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generating high-fidelity dental radiographs is essential for training diagnostic models. Despite the development of numerous methods for other medical data, generative approaches in dental radiology remain unexplored. Due to the intricate tooth structures and specialized terminology, these methods often yield ambiguous tooth regions and incorrect dental concepts when applied to dentistry. In this paper, we take the first attempt to investigate diffusion-based teeth X-ray image generation and propose ToothMaker, a novel framework specifically designed for the dental domain. Firstly, to synthesize X-ray images that possess accurate tooth structures and realistic radiological styles simultaneously, we design control-disentangled fine-tuning (CDFT) strategy. Specifically, we present two separate controllers to handle style and layout control respectively, and introduce a gradient-based decoupling method that optimizes each using their corresponding disentangled gradients. Secondly, to enhance model’s understanding of dental terminology, we propose prior-disentangled guidance module (PDGM), enabling precise synthesis of dental concepts. It utilizes large language model to decompose dental terminology into a series of meta-knowledge elements and performs interactions and refinements through hypergraph neural network. These elements are then fed into the network to guide the generation of dental concepts. Extensive experiments demonstrate the high fidelity and diversity of the images synthesized by our approach. By incorporating the generated data, we achieve substantial performance improvements on downstream segmentation and visual question answering tasks, indicating that our method can greatly reduce the reliance on manually annotated data. Code will be public available at https://github.com/CUHK-AIM-Group/ToothMaker
{"title":"ToothMaker: Realistic Panoramic Dental Radiograph Generation via Disentangled Control","authors":"Weihao Yu;Xiaoqing Guo;Wuyang Li;Xinyu Liu;Hui Chen;Yixuan Yuan","doi":"10.1109/TMI.2025.3588466","DOIUrl":"10.1109/TMI.2025.3588466","url":null,"abstract":"Generating high-fidelity dental radiographs is essential for training diagnostic models. Despite the development of numerous methods for other medical data, generative approaches in dental radiology remain unexplored. Due to the intricate tooth structures and specialized terminology, these methods often yield ambiguous tooth regions and incorrect dental concepts when applied to dentistry. In this paper, we take the first attempt to investigate diffusion-based teeth X-ray image generation and propose ToothMaker, a novel framework specifically designed for the dental domain. Firstly, to synthesize X-ray images that possess accurate tooth structures and realistic radiological styles simultaneously, we design control-disentangled fine-tuning (CDFT) strategy. Specifically, we present two separate controllers to handle style and layout control respectively, and introduce a gradient-based decoupling method that optimizes each using their corresponding disentangled gradients. Secondly, to enhance model’s understanding of dental terminology, we propose prior-disentangled guidance module (PDGM), enabling precise synthesis of dental concepts. It utilizes large language model to decompose dental terminology into a series of meta-knowledge elements and performs interactions and refinements through hypergraph neural network. These elements are then fed into the network to guide the generation of dental concepts. Extensive experiments demonstrate the high fidelity and diversity of the images synthesized by our approach. By incorporating the generated data, we achieve substantial performance improvements on downstream segmentation and visual question answering tasks, indicating that our method can greatly reduce the reliance on manually annotated data. Code will be public available at <uri>https://github.com/CUHK-AIM-Group/ToothMaker</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5233-5244"},"PeriodicalIF":0.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-28DOI: 10.1109/TMI.2025.3589495
Liu Li;Qiang Ma;Cheng Ouyang;Johannes C. Paetzold;Daniel Rueckert;Bernhard Kainz
Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic ($chi $ ). First, we propose a fast formulation for $chi $ computation in both 2D and 3D. The scalar $chi $ error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with $chi $ errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.
{"title":"Topology Optimization in Medical Image Segmentation With Fast χ Euler Characteristic","authors":"Liu Li;Qiang Ma;Cheng Ouyang;Johannes C. Paetzold;Daniel Rueckert;Bernhard Kainz","doi":"10.1109/TMI.2025.3589495","DOIUrl":"10.1109/TMI.2025.3589495","url":null,"abstract":"Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic (<inline-formula> <tex-math>$chi $ </tex-math></inline-formula>). First, we propose a fast formulation for <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> computation in both 2D and 3D. The scalar <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5221-5232"},"PeriodicalIF":0.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-22DOI: 10.1109/TMI.2025.3591364
Shanshan Song;Hui Tang;Honglong Yang;Xiaomeng Li
Radiology Report Generation (RRG) automates the creation of radiology reports from medical imaging, enhancing the efficiency of the reporting process. Longitudinal Radiology Report Generation (LRRG) extends RRG by incorporating the ability to compare current and prior exams, facilitating the tracking of temporal changes in clinical findings. Existing LRRG approaches only extract features from prior and current images using a visual pre-trained encoder, which are then concatenated to generate the final report. However, these methods struggle to effectively capture both spatial and temporal correlations during the feature extraction process. Consequently, the extracted features inadequately capture the information of difference across exams and thus underrepresent the expected progressions, leading to sub-optimal performance in LRRG. To address this, we develop a novel dynamic difference-aware temporal residual network (DDaTR). In DDaTR, we introduce two modules at each stage of the visual encoder to capture multi-level spatial correlations. The Dynamic Feature Alignment Module (DFAM) is designed to align prior features across modalities for the integrity of prior clinical information. Prompted by the enriched prior features, the dynamic difference-aware module (DDAM) captures favorable difference information by identifying relationships across exams. Furthermore, our DDaTR employs the dynamic residual network to unidirectionally transmit longitudinal information, effectively modeling temporal correlations. Extensive experiments demonstrated superior performance over existing methods on three benchmarks, proving its efficacy in both RRG and LRRG tasks. Our code is published at https://github.com/xmed-lab/DDaTR
{"title":"DDaTR: Dynamic Difference-Aware Temporal Residual Network for Longitudinal Radiology Report Generation","authors":"Shanshan Song;Hui Tang;Honglong Yang;Xiaomeng Li","doi":"10.1109/TMI.2025.3591364","DOIUrl":"10.1109/TMI.2025.3591364","url":null,"abstract":"Radiology Report Generation (RRG) automates the creation of radiology reports from medical imaging, enhancing the efficiency of the reporting process. Longitudinal Radiology Report Generation (LRRG) extends RRG by incorporating the ability to compare current and prior exams, facilitating the tracking of temporal changes in clinical findings. Existing LRRG approaches only extract features from prior and current images using a visual pre-trained encoder, which are then concatenated to generate the final report. However, these methods struggle to effectively capture both spatial and temporal correlations during the feature extraction process. Consequently, the extracted features inadequately capture the information of difference across exams and thus underrepresent the expected progressions, leading to sub-optimal performance in LRRG. To address this, we develop a novel dynamic difference-aware temporal residual network (DDaTR). In DDaTR, we introduce two modules at each stage of the visual encoder to capture multi-level spatial correlations. The Dynamic Feature Alignment Module (DFAM) is designed to align prior features across modalities for the integrity of prior clinical information. Prompted by the enriched prior features, the dynamic difference-aware module (DDAM) captures favorable difference information by identifying relationships across exams. Furthermore, our DDaTR employs the dynamic residual network to unidirectionally transmit longitudinal information, effectively modeling temporal correlations. Extensive experiments demonstrated superior performance over existing methods on three benchmarks, proving its efficacy in both RRG and LRRG tasks. Our code is published at <uri>https://github.com/xmed-lab/DDaTR</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5345-5357"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}