Pub Date : 2026-04-08DOI: 10.1109/tmi.2026.3682009
Yinuo Lu,Mingxin Qi,Yao Fu,Zhuoran Xiao,Wei Shao,Jie Tian,Wei Mu
Aggregating features of tens of thousands of patches into Whole Slide Images (WSIs) representations via aggregators is a crucial step in computational pathology. However, existing aggregation strategies overlook the morphological variability of tissue regions in WSIs stemming from differences in clinical procedures and tumor characteristics, leading to two critical limitations: 1) attention collapse in long sequences caused by significant variation in patch numbers across WSIs (ranging from thousands to tens of thousands per WSI); 2) attention misallocation due to under-trained positional embeddings resulting from the non-uniform spatial coordinates introduced by irregular patch distributions. Consequently, current attention-based methods struggle to generalize across this morphological variability, resulting in inconsistent aggregation performance and compromised model reliability in clinical settings. To address these issues, we propose a Entropy-Stabilized Attention-based Multiple Instance Learning (StableMIL) framework, which incorporates an entropy-stabilized attention mechanism to ensure consistent aggregation across WSIs with varying patch numbers and a Randomly Projected 2D rotary position embedding to enhance spatial representation robustness across irregular patch distributions. Extensive theoretical and experimental analyses on nine WSI datasets spanning diverse cancer types, across both classification and survival prediction tasks, demonstrate that StableMIL effectively overcomes the challenges of handling long instance sequences and out-of-distribution spatial coordinates. Our framework consistently outperforms representative baselines, particularly in survival prediction, with stable improvements observed across all evaluated cancer types and morphological scenarios, highlighting its potential for real-world clinical applications. Our source code is available at https://github.com/theeeqi/stableMIL.
{"title":"StableMIL: Entropy-Stabilized Attention-based Multiple Instance Learning for Morphologically Variable Whole Slide Images.","authors":"Yinuo Lu,Mingxin Qi,Yao Fu,Zhuoran Xiao,Wei Shao,Jie Tian,Wei Mu","doi":"10.1109/tmi.2026.3682009","DOIUrl":"https://doi.org/10.1109/tmi.2026.3682009","url":null,"abstract":"Aggregating features of tens of thousands of patches into Whole Slide Images (WSIs) representations via aggregators is a crucial step in computational pathology. However, existing aggregation strategies overlook the morphological variability of tissue regions in WSIs stemming from differences in clinical procedures and tumor characteristics, leading to two critical limitations: 1) attention collapse in long sequences caused by significant variation in patch numbers across WSIs (ranging from thousands to tens of thousands per WSI); 2) attention misallocation due to under-trained positional embeddings resulting from the non-uniform spatial coordinates introduced by irregular patch distributions. Consequently, current attention-based methods struggle to generalize across this morphological variability, resulting in inconsistent aggregation performance and compromised model reliability in clinical settings. To address these issues, we propose a Entropy-Stabilized Attention-based Multiple Instance Learning (StableMIL) framework, which incorporates an entropy-stabilized attention mechanism to ensure consistent aggregation across WSIs with varying patch numbers and a Randomly Projected 2D rotary position embedding to enhance spatial representation robustness across irregular patch distributions. Extensive theoretical and experimental analyses on nine WSI datasets spanning diverse cancer types, across both classification and survival prediction tasks, demonstrate that StableMIL effectively overcomes the challenges of handling long instance sequences and out-of-distribution spatial coordinates. Our framework consistently outperforms representative baselines, particularly in survival prediction, with stable improvements observed across all evaluated cancer types and morphological scenarios, highlighting its potential for real-world clinical applications. Our source code is available at https://github.com/theeeqi/stableMIL.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"20 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147636175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mild cognitive impairment (MCI) is the prodromal stage of dementia involving complex interactions between the brain and peripheral organs. Emerging evidence indicates that heart dysfunction and gut microbiota dysbiosis can contribute to MCI pathogenesis. Yet, these discoveries of cross-organ interactions have not been applied to assist MCI diagnosis. In this work, we propose a novel diagnostic framework that exploits the interactions of brain, heart, and gut using whole-body PET images to guide MCI diagnosis for scenarios when only brain MRI, PET, or PET&MRI are available. Specifically, we collected a multi-cohort, multi-modal dataset comprising 1,545 whole-body PET images, 6,010 brain MR images, and 2,446 brain PET images from eight data centers. Organ-specific image encoders are first pretrained for the brain, heart, and gut individually. Then, to effectively align and integrate brain, heart, and gut features, we introduce positional prompts to act as anatomical-level attention to highlight disease-relevant spatial regions, and further develop hierarchical Transformers to model brain-heart, brain-gut, and brain-heart-gut interactions. Finally, to achieve MCI diagnosis using only brain images, we transfer the above brain-heart-gut model to a brain-only model via an introduced multi-level knowledge distillation scheme, including sample-level contrastive distillation, group-level distribution alignment, and response-level supervision. Extensive experiments on multi-center data demonstrate the superiority of our method over the state-of-the-art methods by resorting to effective integration of heart and gut interactions for MCI diagnosis.
{"title":"Positional Prompts-Enhanced Brain-Heart-Gut Interactions for Mild Cognitive Impairment Diagnosis.","authors":"Fan Li,Shilun Zhao,Shuwei Bai,Dengqiang Jia,Fang Xie,Jiangtao Liang,Han Zhang,Ya Zhang,Zhongxiang Ding,Yin Xu,Kaicong Sun,Dinggang Shen","doi":"10.1109/tmi.2026.3681075","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681075","url":null,"abstract":"Mild cognitive impairment (MCI) is the prodromal stage of dementia involving complex interactions between the brain and peripheral organs. Emerging evidence indicates that heart dysfunction and gut microbiota dysbiosis can contribute to MCI pathogenesis. Yet, these discoveries of cross-organ interactions have not been applied to assist MCI diagnosis. In this work, we propose a novel diagnostic framework that exploits the interactions of brain, heart, and gut using whole-body PET images to guide MCI diagnosis for scenarios when only brain MRI, PET, or PET&MRI are available. Specifically, we collected a multi-cohort, multi-modal dataset comprising 1,545 whole-body PET images, 6,010 brain MR images, and 2,446 brain PET images from eight data centers. Organ-specific image encoders are first pretrained for the brain, heart, and gut individually. Then, to effectively align and integrate brain, heart, and gut features, we introduce positional prompts to act as anatomical-level attention to highlight disease-relevant spatial regions, and further develop hierarchical Transformers to model brain-heart, brain-gut, and brain-heart-gut interactions. Finally, to achieve MCI diagnosis using only brain images, we transfer the above brain-heart-gut model to a brain-only model via an introduced multi-level knowledge distillation scheme, including sample-level contrastive distillation, group-level distribution alignment, and response-level supervision. Extensive experiments on multi-center data demonstrate the superiority of our method over the state-of-the-art methods by resorting to effective integration of heart and gut interactions for MCI diagnosis.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"13 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pneumonia is an acute respiratory infection, posing a serious threat to health and lives. Lung ultrasound (LUS), as a non-invasive and rapid imaging technique, can monitor real-time changes in lung, providing valuable assistance in clinical diagnosis. However, most LUS studies are limited to frame-level analysis and ignore respiratory cycle changes, leading to diagnostic errors. To address these problems, we propose a cross-dimensional spatial-temporal feature integration model for LUS video analysis. Specifically, the sliding window and feature difference analysis are first utilized to preprocess the original LUS videos for eliminating invalid and highly similar frames and implementing abstract video. Subsequently, a cross-dimensional feature fusion backbone integrates an improved temporal-C3D network and a self-designed recursive inception-meet-transformer (IMT) network to extract features from different dimensions for fusion. Thereby, comprehensive features can be obtained for characterizing LUS videos. Finally, the Longformer is employed to analyze the temporal dependencies of cross-dimensional features, supplemented by a classification head for evaluating LUS videos. 3018 LUS video clips were collected from 119 patients in three hospitals for the evaluation of the proposed LUS video scoring model. By dividing at the patient level, the training and testing set consist of 2652 clips from 104 patients and 366 clips from 15 patients, respectively. Experimental results of 5-fold cross validation demonstrate that the proposed model achieves outstanding scoring performance, with an accuracy, precision, recall, specificity, F1-score, and AUC of 91.78 ± 0.52%, 92.19 ± 0.76%, 91.81 ± 0.62%, 97.17 ± 0.20%, 91.94 ± 0.44%, and 97.83 ± 0.19%, respectively. The independent testing set also shows the superior generalization capability with a scoring accuracy of 87.65 ± 1.12%. Moreover, ablation studies confirm that each designed module contributes significantly to the model's performance, and comparative experiments further confirm the superiority of the proposed model compared to previous models. These robust findings highlight the proposed LUS video scoring model's strong potential for clinical deployment.
{"title":"Cross-dimensional Spatial-temporal Feature Integration Framework for Lung Ultrasound Video Analysis in Pneumonia.","authors":"Yiwen Liu,Chao He,Dongni Hou,Dean Ta,Mingbo Zhao,Wenyu Xing","doi":"10.1109/tmi.2026.3681138","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681138","url":null,"abstract":"Pneumonia is an acute respiratory infection, posing a serious threat to health and lives. Lung ultrasound (LUS), as a non-invasive and rapid imaging technique, can monitor real-time changes in lung, providing valuable assistance in clinical diagnosis. However, most LUS studies are limited to frame-level analysis and ignore respiratory cycle changes, leading to diagnostic errors. To address these problems, we propose a cross-dimensional spatial-temporal feature integration model for LUS video analysis. Specifically, the sliding window and feature difference analysis are first utilized to preprocess the original LUS videos for eliminating invalid and highly similar frames and implementing abstract video. Subsequently, a cross-dimensional feature fusion backbone integrates an improved temporal-C3D network and a self-designed recursive inception-meet-transformer (IMT) network to extract features from different dimensions for fusion. Thereby, comprehensive features can be obtained for characterizing LUS videos. Finally, the Longformer is employed to analyze the temporal dependencies of cross-dimensional features, supplemented by a classification head for evaluating LUS videos. 3018 LUS video clips were collected from 119 patients in three hospitals for the evaluation of the proposed LUS video scoring model. By dividing at the patient level, the training and testing set consist of 2652 clips from 104 patients and 366 clips from 15 patients, respectively. Experimental results of 5-fold cross validation demonstrate that the proposed model achieves outstanding scoring performance, with an accuracy, precision, recall, specificity, F1-score, and AUC of 91.78 ± 0.52%, 92.19 ± 0.76%, 91.81 ± 0.62%, 97.17 ± 0.20%, 91.94 ± 0.44%, and 97.83 ± 0.19%, respectively. The independent testing set also shows the superior generalization capability with a scoring accuracy of 87.65 ± 1.12%. Moreover, ablation studies confirm that each designed module contributes significantly to the model's performance, and comparative experiments further confirm the superiority of the proposed model compared to previous models. These robust findings highlight the proposed LUS video scoring model's strong potential for clinical deployment.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"23 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-06DOI: 10.1109/tmi.2026.3681175
Runlong He,Danyal Z Khan,Evangelos B Mazomenos,Hani J Marcus,Danail Stoyanov,Matthew J Clarkson,Mobarak I Hoque
Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advance surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an Open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting GPT-2 to pituitary surgery. Open-Ended PitVQA comprises 109,173 frames from 25 procedural videos with 795,270 question-answer sentence pairs, covering key surgical elements such as phase and step recognition, context understanding, tool detection, localization, and interactions recognition. Vector-MoLoRA incorporates the principles of LoRA and MoRA to develop a matrix-low-rank adaptation strategy that employs rank vectors to allocate more parameters to earlier layers, gradually reducing them in the later layers. Our approach, validated on the Open-Ended PitVQA and EndoVis18-VQA datasets, effectively mitigates catastrophic forgetting while significantly enhancing performance over recent baselines. Performance-rejection analysis further highlights Vector-MoLoRA's enhanced reliability and trust-worthiness in handling uncertain predictions. Our source code and dataset is available at https://github.com/ HRL-Mike/PitVQA-Plus.
{"title":"PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery.","authors":"Runlong He,Danyal Z Khan,Evangelos B Mazomenos,Hani J Marcus,Danail Stoyanov,Matthew J Clarkson,Mobarak I Hoque","doi":"10.1109/tmi.2026.3681175","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681175","url":null,"abstract":"Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advance surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an Open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting GPT-2 to pituitary surgery. Open-Ended PitVQA comprises 109,173 frames from 25 procedural videos with 795,270 question-answer sentence pairs, covering key surgical elements such as phase and step recognition, context understanding, tool detection, localization, and interactions recognition. Vector-MoLoRA incorporates the principles of LoRA and MoRA to develop a matrix-low-rank adaptation strategy that employs rank vectors to allocate more parameters to earlier layers, gradually reducing them in the later layers. Our approach, validated on the Open-Ended PitVQA and EndoVis18-VQA datasets, effectively mitigates catastrophic forgetting while significantly enhancing performance over recent baselines. Performance-rejection analysis further highlights Vector-MoLoRA's enhanced reliability and trust-worthiness in handling uncertain predictions. Our source code and dataset is available at https://github.com/ HRL-Mike/PitVQA-Plus.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"11 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-06DOI: 10.1109/tmi.2026.3680920
Huihui Yu,Qun Dai
3D medical imaging modalities, including CT and MRI, provide high-resolution views essential for precision medicine. However, the increasing volume and complexity of 3D medical images challenge manual analysis, particularly in classification and segmentation tasks. Although deep learning has shown considerable promise, it struggles to characterize small-scale, low-contrast anatomical structures, generalize across imaging domains, and mitigate annotation scarcity. Self-supervised learning (SSL) has emerged as an effective and annotation-efficient solution, yet existing methods, largely adapted from natural images, often fail to capture the anatomical heterogeneity and complex semantic dependencies inherent in 3D medical data. To address these limitations, we propose AG-SSD (Anatomy-Guided Self-Supervised Distillation), a framework that explicitly incorporates anatomical priors into SSL. AG-SSD comprises three complementary modules: (i) Cross-View Anatomical Consistency (CVAC), which generates multi-scale, anatomically consistent positive pairs via overlap-aware cropping; (ii) Edge-Aware Adaptive Masking (EAAM), which prioritizes anatomy-sensitive, high-edge regions to enhance local feature learning and robust global representation; and (iii) Cross-View Attention Alignment (CVAA), which leverages attention-based fusion to achieve semantic compensation and alignment across views, mitigating semantic drift to stabilize distillation. These modules are optimized using a unified objective that combines intraview patch distillation, inter-view [CLS] token distillation, and masked patch reconstruction. Extensive experiments on CT and MRI datasets demonstrate that AG-SSD consistently outperforms state-of-the-art SSL methods in both classification and segmentation under annotation-scarce scenarios, highlighting its potential as a scalable, label-efficient paradigm for 3D medical image analysis and clinical applications.
{"title":"Anatomy-Guided Self-Supervised Distillation Learning for Medical Image Analysis.","authors":"Huihui Yu,Qun Dai","doi":"10.1109/tmi.2026.3680920","DOIUrl":"https://doi.org/10.1109/tmi.2026.3680920","url":null,"abstract":"3D medical imaging modalities, including CT and MRI, provide high-resolution views essential for precision medicine. However, the increasing volume and complexity of 3D medical images challenge manual analysis, particularly in classification and segmentation tasks. Although deep learning has shown considerable promise, it struggles to characterize small-scale, low-contrast anatomical structures, generalize across imaging domains, and mitigate annotation scarcity. Self-supervised learning (SSL) has emerged as an effective and annotation-efficient solution, yet existing methods, largely adapted from natural images, often fail to capture the anatomical heterogeneity and complex semantic dependencies inherent in 3D medical data. To address these limitations, we propose AG-SSD (Anatomy-Guided Self-Supervised Distillation), a framework that explicitly incorporates anatomical priors into SSL. AG-SSD comprises three complementary modules: (i) Cross-View Anatomical Consistency (CVAC), which generates multi-scale, anatomically consistent positive pairs via overlap-aware cropping; (ii) Edge-Aware Adaptive Masking (EAAM), which prioritizes anatomy-sensitive, high-edge regions to enhance local feature learning and robust global representation; and (iii) Cross-View Attention Alignment (CVAA), which leverages attention-based fusion to achieve semantic compensation and alignment across views, mitigating semantic drift to stabilize distillation. These modules are optimized using a unified objective that combines intraview patch distillation, inter-view [CLS] token distillation, and masked patch reconstruction. Extensive experiments on CT and MRI datasets demonstrate that AG-SSD consistently outperforms state-of-the-art SSL methods in both classification and segmentation under annotation-scarce scenarios, highlighting its potential as a scalable, label-efficient paradigm for 3D medical image analysis and clinical applications.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"7 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Driven Band Optimization and Frequency-Aware Modeling in Medical Hyperspectral Image Segmentation","authors":"Wei Li, Geng Qin, Huan Liu, Xueyu Zhang, Yunfei Zhou, Haihao Zhang, Xiang-Gen Xia","doi":"10.1109/tmi.2026.3680239","DOIUrl":"https://doi.org/10.1109/tmi.2026.3680239","url":null,"abstract":"","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"64 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147599179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA.
{"title":"ConfIC-RCA: Statistically Grounded Efficient Estimation of Segmentation Quality.","authors":"Matias Cosarinsky,Ramiro Billot,Lucas Mansilla,Gabriel Jimenez,Nicolas Gaggion,Guanghui Fu,Tom Tirer,Enzo Ferrante","doi":"10.1109/tmi.2026.3679618","DOIUrl":"https://doi.org/10.1109/tmi.2026.3679618","url":null,"abstract":"Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"102 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147583998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-31DOI: 10.1109/tmi.2026.3679527
Panagiota Gatoula, George Dimas, Dimitris K. lakovidis
{"title":"Interpretable Similarity of Synthetic Image Utility","authors":"Panagiota Gatoula, George Dimas, Dimitris K. lakovidis","doi":"10.1109/tmi.2026.3679527","DOIUrl":"https://doi.org/10.1109/tmi.2026.3679527","url":null,"abstract":"","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"23 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147586624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-30DOI: 10.1109/tmi.2026.3678953
Viktor Vegh, Qianqian Yang, Megan Farquhar, Thomas R. Barrick
{"title":"Insights from particle simulations for diffusion MRI microstructure studies","authors":"Viktor Vegh, Qianqian Yang, Megan Farquhar, Thomas R. Barrick","doi":"10.1109/tmi.2026.3678953","DOIUrl":"https://doi.org/10.1109/tmi.2026.3678953","url":null,"abstract":"","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"31 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147577891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}