IEEE Transactions on Medical Imaging最新文献

StableMIL: Entropy-Stabilized Attention-based Multiple Instance Learning for Morphologically Variable Whole Slide Images. StableMIL：基于熵稳定注意的形态学变量整张幻灯片多实例学习。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-08 DOI: 10.1109/tmi.2026.3682009

Yinuo Lu,Mingxin Qi,Yao Fu,Zhuoran Xiao,Wei Shao,Jie Tian,Wei Mu

Aggregating features of tens of thousands of patches into Whole Slide Images (WSIs) representations via aggregators is a crucial step in computational pathology. However, existing aggregation strategies overlook the morphological variability of tissue regions in WSIs stemming from differences in clinical procedures and tumor characteristics, leading to two critical limitations: 1) attention collapse in long sequences caused by significant variation in patch numbers across WSIs (ranging from thousands to tens of thousands per WSI); 2) attention misallocation due to under-trained positional embeddings resulting from the non-uniform spatial coordinates introduced by irregular patch distributions. Consequently, current attention-based methods struggle to generalize across this morphological variability, resulting in inconsistent aggregation performance and compromised model reliability in clinical settings. To address these issues, we propose a Entropy-Stabilized Attention-based Multiple Instance Learning (StableMIL) framework, which incorporates an entropy-stabilized attention mechanism to ensure consistent aggregation across WSIs with varying patch numbers and a Randomly Projected 2D rotary position embedding to enhance spatial representation robustness across irregular patch distributions. Extensive theoretical and experimental analyses on nine WSI datasets spanning diverse cancer types, across both classification and survival prediction tasks, demonstrate that StableMIL effectively overcomes the challenges of handling long instance sequences and out-of-distribution spatial coordinates. Our framework consistently outperforms representative baselines, particularly in survival prediction, with stable improvements observed across all evaluated cancer types and morphological scenarios, highlighting its potential for real-world clinical applications. Our source code is available at https://github.com/theeeqi/stableMIL.

通过聚合器将数以万计的斑块特征聚合成完整的幻灯片图像（wsi）表示是计算病理学的关键步骤。然而，由于临床程序和肿瘤特征的差异，现有的聚集策略忽略了WSI中组织区域的形态学变异性，导致两个关键的局限性：1)长序列的注意力崩溃，这是由WSI中斑块数量的显著变化引起的（每个WSI的斑块数量从数千到数万不等）；2)由于不规则斑块分布引入的不均匀空间坐标导致的位置嵌入训练不足，导致注意力分配不当。因此，目前基于注意力的方法难以推广这种形态变异，导致临床环境中聚合性能不一致和模型可靠性受损。为了解决这些问题，我们提出了一个熵稳定的基于注意力的多实例学习框架（StableMIL），该框架结合了一个熵稳定的注意力机制，以确保不同补丁数量的wsi之间的一致聚合，以及一个随机投影的二维旋转位置嵌入，以增强不规则补丁分布的空间表征鲁棒性。对9个WSI数据集进行了广泛的理论和实验分析，涵盖了不同癌症类型的分类和生存预测任务，表明StableMIL有效地克服了处理长实例序列和分布外空间坐标的挑战。我们的框架始终优于代表性基线，特别是在生存预测方面，在所有评估的癌症类型和形态学情景中观察到稳定的改善，突出了其在现实世界临床应用的潜力。我们的源代码可从https://github.com/theeeqi/stableMIL获得。

{"title":"StableMIL: Entropy-Stabilized Attention-based Multiple Instance Learning for Morphologically Variable Whole Slide Images.","authors":"Yinuo Lu,Mingxin Qi,Yao Fu,Zhuoran Xiao,Wei Shao,Jie Tian,Wei Mu","doi":"10.1109/tmi.2026.3682009","DOIUrl":"https://doi.org/10.1109/tmi.2026.3682009","url":null,"abstract":"Aggregating features of tens of thousands of patches into Whole Slide Images (WSIs) representations via aggregators is a crucial step in computational pathology. However, existing aggregation strategies overlook the morphological variability of tissue regions in WSIs stemming from differences in clinical procedures and tumor characteristics, leading to two critical limitations: 1) attention collapse in long sequences caused by significant variation in patch numbers across WSIs (ranging from thousands to tens of thousands per WSI); 2) attention misallocation due to under-trained positional embeddings resulting from the non-uniform spatial coordinates introduced by irregular patch distributions. Consequently, current attention-based methods struggle to generalize across this morphological variability, resulting in inconsistent aggregation performance and compromised model reliability in clinical settings. To address these issues, we propose a Entropy-Stabilized Attention-based Multiple Instance Learning (StableMIL) framework, which incorporates an entropy-stabilized attention mechanism to ensure consistent aggregation across WSIs with varying patch numbers and a Randomly Projected 2D rotary position embedding to enhance spatial representation robustness across irregular patch distributions. Extensive theoretical and experimental analyses on nine WSI datasets spanning diverse cancer types, across both classification and survival prediction tasks, demonstrate that StableMIL effectively overcomes the challenges of handling long instance sequences and out-of-distribution spatial coordinates. Our framework consistently outperforms representative baselines, particularly in survival prediction, with stable improvements observed across all evaluated cancer types and morphological scenarios, highlighting its potential for real-world clinical applications. Our source code is available at https://github.com/theeeqi/stableMIL.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"20 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147636175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Positional Prompts-Enhanced Brain-Heart-Gut Interactions for Mild Cognitive Impairment Diagnosis. 位置提示-增强脑-心-肠相互作用对轻度认知障碍的诊断。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-06 DOI: 10.1109/tmi.2026.3681075

Fan Li,Shilun Zhao,Shuwei Bai,Dengqiang Jia,Fang Xie,Jiangtao Liang,Han Zhang,Ya Zhang,Zhongxiang Ding,Yin Xu,Kaicong Sun,Dinggang Shen

Mild cognitive impairment (MCI) is the prodromal stage of dementia involving complex interactions between the brain and peripheral organs. Emerging evidence indicates that heart dysfunction and gut microbiota dysbiosis can contribute to MCI pathogenesis. Yet, these discoveries of cross-organ interactions have not been applied to assist MCI diagnosis. In this work, we propose a novel diagnostic framework that exploits the interactions of brain, heart, and gut using whole-body PET images to guide MCI diagnosis for scenarios when only brain MRI, PET, or PET&MRI are available. Specifically, we collected a multi-cohort, multi-modal dataset comprising 1,545 whole-body PET images, 6,010 brain MR images, and 2,446 brain PET images from eight data centers. Organ-specific image encoders are first pretrained for the brain, heart, and gut individually. Then, to effectively align and integrate brain, heart, and gut features, we introduce positional prompts to act as anatomical-level attention to highlight disease-relevant spatial regions, and further develop hierarchical Transformers to model brain-heart, brain-gut, and brain-heart-gut interactions. Finally, to achieve MCI diagnosis using only brain images, we transfer the above brain-heart-gut model to a brain-only model via an introduced multi-level knowledge distillation scheme, including sample-level contrastive distillation, group-level distribution alignment, and response-level supervision. Extensive experiments on multi-center data demonstrate the superiority of our method over the state-of-the-art methods by resorting to effective integration of heart and gut interactions for MCI diagnosis.

轻度认知障碍（MCI）是痴呆的前驱阶段，涉及大脑和周围器官之间复杂的相互作用。新出现的证据表明，心脏功能障碍和肠道菌群失调可能有助于MCI的发病机制。然而，这些跨器官相互作用的发现尚未应用于MCI的诊断。在这项工作中，我们提出了一个新的诊断框架，利用大脑、心脏和肠道的相互作用，使用全身PET图像来指导MCI诊断，当只有脑MRI、PET或PET&MRI可用时。具体来说，我们收集了一个多队列、多模式的数据集，包括来自八个数据中心的1,545张全身PET图像、6,010张脑MR图像和2,446张脑PET图像。器官特异性图像编码器首先分别针对大脑、心脏和肠道进行预训练。然后，为了有效地对齐和整合大脑、心脏和肠道的特征，我们引入位置提示作为解剖学层面的关注来突出疾病相关的空间区域，并进一步开发分层变形来模拟脑-心、脑-肠和脑-心-肠的相互作用。最后，为了实现仅使用脑图像的MCI诊断，我们通过引入多层次知识蒸馏方案，包括样本水平的对比蒸馏、群体水平的分布校准和响应水平的监督，将上述脑-心-肠模型转化为脑-肠模型。在多中心数据上进行的大量实验表明，通过有效整合心脏和肠道相互作用来诊断MCI，我们的方法优于最先进的方法。

{"title":"Positional Prompts-Enhanced Brain-Heart-Gut Interactions for Mild Cognitive Impairment Diagnosis.","authors":"Fan Li,Shilun Zhao,Shuwei Bai,Dengqiang Jia,Fang Xie,Jiangtao Liang,Han Zhang,Ya Zhang,Zhongxiang Ding,Yin Xu,Kaicong Sun,Dinggang Shen","doi":"10.1109/tmi.2026.3681075","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681075","url":null,"abstract":"Mild cognitive impairment (MCI) is the prodromal stage of dementia involving complex interactions between the brain and peripheral organs. Emerging evidence indicates that heart dysfunction and gut microbiota dysbiosis can contribute to MCI pathogenesis. Yet, these discoveries of cross-organ interactions have not been applied to assist MCI diagnosis. In this work, we propose a novel diagnostic framework that exploits the interactions of brain, heart, and gut using whole-body PET images to guide MCI diagnosis for scenarios when only brain MRI, PET, or PET&MRI are available. Specifically, we collected a multi-cohort, multi-modal dataset comprising 1,545 whole-body PET images, 6,010 brain MR images, and 2,446 brain PET images from eight data centers. Organ-specific image encoders are first pretrained for the brain, heart, and gut individually. Then, to effectively align and integrate brain, heart, and gut features, we introduce positional prompts to act as anatomical-level attention to highlight disease-relevant spatial regions, and further develop hierarchical Transformers to model brain-heart, brain-gut, and brain-heart-gut interactions. Finally, to achieve MCI diagnosis using only brain images, we transfer the above brain-heart-gut model to a brain-only model via an introduced multi-level knowledge distillation scheme, including sample-level contrastive distillation, group-level distribution alignment, and response-level supervision. Extensive experiments on multi-center data demonstrate the superiority of our method over the state-of-the-art methods by resorting to effective integration of heart and gut interactions for MCI diagnosis.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"13 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-dimensional Spatial-temporal Feature Integration Framework for Lung Ultrasound Video Analysis in Pneumonia. 肺炎肺部超声视频分析的跨维时空特征集成框架。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-06 DOI: 10.1109/tmi.2026.3681138

Yiwen Liu,Chao He,Dongni Hou,Dean Ta,Mingbo Zhao,Wenyu Xing

Pneumonia is an acute respiratory infection, posing a serious threat to health and lives. Lung ultrasound (LUS), as a non-invasive and rapid imaging technique, can monitor real-time changes in lung, providing valuable assistance in clinical diagnosis. However, most LUS studies are limited to frame-level analysis and ignore respiratory cycle changes, leading to diagnostic errors. To address these problems, we propose a cross-dimensional spatial-temporal feature integration model for LUS video analysis. Specifically, the sliding window and feature difference analysis are first utilized to preprocess the original LUS videos for eliminating invalid and highly similar frames and implementing abstract video. Subsequently, a cross-dimensional feature fusion backbone integrates an improved temporal-C3D network and a self-designed recursive inception-meet-transformer (IMT) network to extract features from different dimensions for fusion. Thereby, comprehensive features can be obtained for characterizing LUS videos. Finally, the Longformer is employed to analyze the temporal dependencies of cross-dimensional features, supplemented by a classification head for evaluating LUS videos. 3018 LUS video clips were collected from 119 patients in three hospitals for the evaluation of the proposed LUS video scoring model. By dividing at the patient level, the training and testing set consist of 2652 clips from 104 patients and 366 clips from 15 patients, respectively. Experimental results of 5-fold cross validation demonstrate that the proposed model achieves outstanding scoring performance, with an accuracy, precision, recall, specificity, F1-score, and AUC of 91.78 ± 0.52%, 92.19 ± 0.76%, 91.81 ± 0.62%, 97.17 ± 0.20%, 91.94 ± 0.44%, and 97.83 ± 0.19%, respectively. The independent testing set also shows the superior generalization capability with a scoring accuracy of 87.65 ± 1.12%. Moreover, ablation studies confirm that each designed module contributes significantly to the model's performance, and comparative experiments further confirm the superiority of the proposed model compared to previous models. These robust findings highlight the proposed LUS video scoring model's strong potential for clinical deployment.

肺炎是一种急性呼吸道感染，对健康和生命构成严重威胁。肺超声（LUS）作为一种无创、快速的成像技术，可以实时监测肺的变化，为临床诊断提供宝贵的帮助。然而，大多数LUS研究仅限于框架水平分析，忽略呼吸周期变化，导致诊断错误。为了解决这些问题，我们提出了一种用于LUS视频分析的跨维时空特征集成模型。具体而言，首先利用滑动窗口和特征差异分析对原始LUS视频进行预处理，去除无效帧和高度相似帧，实现抽象视频。随后，跨维度特征融合主干将改进的时间- c3d网络和自设计的递归起始-相遇-变压器（IMT）网络集成在一起，从不同维度提取特征进行融合。从而获得对LUS视频进行表征的综合特征。最后，使用Longformer分析跨维特征的时间依赖性，并使用分类头对LUS视频进行评估。收集3家医院119名患者的3018个LUS视频片段，对所提出的LUS视频评分模型进行评价。在患者层面进行划分，训练集和测试集分别由来自104名患者的2652个片段和来自15名患者的366个片段组成。5重交叉验证的实验结果表明，该模型取得了较好的评分效果，准确率（正确率）为91.78±0.52%，精密度（精密度）为92.19±0.76%，召回率（召回率）为91.81±0.62%，f1评分为97.17±0.20%，召回率（召回率）为91.94±0.44%，AUC为97.83±0.19%。独立测试集也显示出较好的泛化能力，评分准确率为87.65±1.12%。此外，烧蚀研究证实了所设计的每个模块对模型的性能都有显著的贡献，对比实验进一步证实了所提出模型相对于以往模型的优越性。这些强有力的发现突出了所提出的LUS视频评分模型在临床应用中的强大潜力。

{"title":"Cross-dimensional Spatial-temporal Feature Integration Framework for Lung Ultrasound Video Analysis in Pneumonia.","authors":"Yiwen Liu,Chao He,Dongni Hou,Dean Ta,Mingbo Zhao,Wenyu Xing","doi":"10.1109/tmi.2026.3681138","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681138","url":null,"abstract":"Pneumonia is an acute respiratory infection, posing a serious threat to health and lives. Lung ultrasound (LUS), as a non-invasive and rapid imaging technique, can monitor real-time changes in lung, providing valuable assistance in clinical diagnosis. However, most LUS studies are limited to frame-level analysis and ignore respiratory cycle changes, leading to diagnostic errors. To address these problems, we propose a cross-dimensional spatial-temporal feature integration model for LUS video analysis. Specifically, the sliding window and feature difference analysis are first utilized to preprocess the original LUS videos for eliminating invalid and highly similar frames and implementing abstract video. Subsequently, a cross-dimensional feature fusion backbone integrates an improved temporal-C3D network and a self-designed recursive inception-meet-transformer (IMT) network to extract features from different dimensions for fusion. Thereby, comprehensive features can be obtained for characterizing LUS videos. Finally, the Longformer is employed to analyze the temporal dependencies of cross-dimensional features, supplemented by a classification head for evaluating LUS videos. 3018 LUS video clips were collected from 119 patients in three hospitals for the evaluation of the proposed LUS video scoring model. By dividing at the patient level, the training and testing set consist of 2652 clips from 104 patients and 366 clips from 15 patients, respectively. Experimental results of 5-fold cross validation demonstrate that the proposed model achieves outstanding scoring performance, with an accuracy, precision, recall, specificity, F1-score, and AUC of 91.78 ± 0.52%, 92.19 ± 0.76%, 91.81 ± 0.62%, 97.17 ± 0.20%, 91.94 ± 0.44%, and 97.83 ± 0.19%, respectively. The independent testing set also shows the superior generalization capability with a scoring accuracy of 87.65 ± 1.12%. Moreover, ablation studies confirm that each designed module contributes significantly to the model's performance, and comparative experiments further confirm the superiority of the proposed model compared to previous models. These robust findings highlight the proposed LUS video scoring model's strong potential for clinical deployment.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"23 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery. 垂体外科开放式视觉问答的向量矩阵-低秩自适应。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-06 DOI: 10.1109/tmi.2026.3681175

Runlong He,Danyal Z Khan,Evangelos B Mazomenos,Hani J Marcus,Danail Stoyanov,Matthew J Clarkson,Mobarak I Hoque

Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advance surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an Open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting GPT-2 to pituitary surgery. Open-Ended PitVQA comprises 109,173 frames from 25 procedural videos with 795,270 question-answer sentence pairs, covering key surgical elements such as phase and step recognition, context understanding, tool detection, localization, and interactions recognition. Vector-MoLoRA incorporates the principles of LoRA and MoRA to develop a matrix-low-rank adaptation strategy that employs rank vectors to allocate more parameters to earlier layers, gradually reducing them in the later layers. Our approach, validated on the Open-Ended PitVQA and EndoVis18-VQA datasets, effectively mitigates catastrophic forgetting while significantly enhancing performance over recent baselines. Performance-rejection analysis further highlights Vector-MoLoRA's enhanced reliability and trust-worthiness in handling uncertain predictions. Our source code and dataset is available at https://github.com/ HRL-Mike/PitVQA-Plus.

视觉问答（VQA）中的视觉语言模型（VLMs）提供了一个独特的机会来增强术中决策，促进直观的互动，并显著推进外科教育。然而，由于有限的数据集以及在预训练权重的完全微调过程中过度拟合和灾难性遗忘的风险，用于外科VQA的vlm的发展具有挑战性。虽然低秩自适应（LoRA）和秩自适应矩阵（MoRA）等参数高效技术解决了自适应挑战，但它们的均匀参数分布忽略了深度网络中的特征层次，其中较早的层学习一般特征，需要比较晚的层更多的参数。本文介绍了PitVQA++及其开放式PitVQA数据集和矢量矩阵低秩自适应（vector - molora），这是一种创新的VLM微调方法，用于使GPT-2适应垂体手术。开放式PitVQA包括来自25个手术视频的109,173帧和795,270对问答句子，涵盖了关键的手术要素，如阶段和步骤识别、上下文理解、工具检测、定位和交互识别。Vector-MoLoRA结合LoRA和MoRA的原理，开发了一种矩阵低秩自适应策略，利用秩向量将更多的参数分配到较早的层，在较晚的层中逐渐减少参数。我们的方法经过开放式PitVQA和EndoVis18-VQA数据集的验证，有效地减轻了灾难性遗忘，同时显著提高了近期基线的性能。性能拒绝分析进一步突出了Vector-MoLoRA在处理不确定预测方面增强的可靠性和可信度。我们的源代码和数据集可在https://github.com/ HRL-Mike/PitVQA-Plus上获得。

{"title":"PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery.","authors":"Runlong He,Danyal Z Khan,Evangelos B Mazomenos,Hani J Marcus,Danail Stoyanov,Matthew J Clarkson,Mobarak I Hoque","doi":"10.1109/tmi.2026.3681175","DOIUrl":"https://doi.org/10.1109/tmi.2026.3681175","url":null,"abstract":"Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advance surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an Open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting GPT-2 to pituitary surgery. Open-Ended PitVQA comprises 109,173 frames from 25 procedural videos with 795,270 question-answer sentence pairs, covering key surgical elements such as phase and step recognition, context understanding, tool detection, localization, and interactions recognition. Vector-MoLoRA incorporates the principles of LoRA and MoRA to develop a matrix-low-rank adaptation strategy that employs rank vectors to allocate more parameters to earlier layers, gradually reducing them in the later layers. Our approach, validated on the Open-Ended PitVQA and EndoVis18-VQA datasets, effectively mitigates catastrophic forgetting while significantly enhancing performance over recent baselines. Performance-rejection analysis further highlights Vector-MoLoRA's enhanced reliability and trust-worthiness in handling uncertain predictions. Our source code and dataset is available at https://github.com/ HRL-Mike/PitVQA-Plus.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"11 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anatomy-Guided Self-Supervised Distillation Learning for Medical Image Analysis. 解剖引导的医学图像分析自监督蒸馏学习。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-06 DOI: 10.1109/tmi.2026.3680920

Huihui Yu,Qun Dai

3D medical imaging modalities, including CT and MRI, provide high-resolution views essential for precision medicine. However, the increasing volume and complexity of 3D medical images challenge manual analysis, particularly in classification and segmentation tasks. Although deep learning has shown considerable promise, it struggles to characterize small-scale, low-contrast anatomical structures, generalize across imaging domains, and mitigate annotation scarcity. Self-supervised learning (SSL) has emerged as an effective and annotation-efficient solution, yet existing methods, largely adapted from natural images, often fail to capture the anatomical heterogeneity and complex semantic dependencies inherent in 3D medical data. To address these limitations, we propose AG-SSD (Anatomy-Guided Self-Supervised Distillation), a framework that explicitly incorporates anatomical priors into SSL. AG-SSD comprises three complementary modules: (i) Cross-View Anatomical Consistency (CVAC), which generates multi-scale, anatomically consistent positive pairs via overlap-aware cropping; (ii) Edge-Aware Adaptive Masking (EAAM), which prioritizes anatomy-sensitive, high-edge regions to enhance local feature learning and robust global representation; and (iii) Cross-View Attention Alignment (CVAA), which leverages attention-based fusion to achieve semantic compensation and alignment across views, mitigating semantic drift to stabilize distillation. These modules are optimized using a unified objective that combines intraview patch distillation, inter-view [CLS] token distillation, and masked patch reconstruction. Extensive experiments on CT and MRI datasets demonstrate that AG-SSD consistently outperforms state-of-the-art SSL methods in both classification and segmentation under annotation-scarce scenarios, highlighting its potential as a scalable, label-efficient paradigm for 3D medical image analysis and clinical applications.

包括CT和MRI在内的3D医学成像模式为精准医学提供了必不可少的高分辨率视图。然而，3D医学图像的体积和复杂性不断增加，对人工分析提出了挑战，特别是在分类和分割任务方面。尽管深度学习已经显示出相当大的前景，但它在描述小规模、低对比度的解剖结构、跨成像领域的泛化以及缓解注释稀缺性方面仍存在困难。自监督学习（SSL）已经成为一种有效且高效的注释解决方案，然而现有的方法主要来自自然图像，往往无法捕捉3D医疗数据中固有的解剖异质性和复杂的语义依赖关系。为了解决这些限制，我们提出AG-SSD（解剖引导自监督蒸馏），这是一个明确地将解剖先验纳入SSL的框架。AG-SSD包括三个互补模块：(i)交叉视图解剖一致性（CVAC），它通过重叠感知裁剪生成多尺度，解剖一致的正对；（ii）边缘感知自适应掩蔽（EAAM），优先考虑解剖敏感的高边缘区域，以增强局部特征学习和鲁棒的全局表示；（iii）跨视图注意对齐（CVAA），它利用基于注意力的融合来实现跨视图的语义补偿和对齐，减轻语义漂移以稳定蒸馏。这些模块使用统一的目标进行优化，该目标结合了视图内补丁蒸馏，视图间[CLS]令牌蒸馏和掩码补丁重建。在CT和MRI数据集上进行的大量实验表明，AG-SSD在注释稀缺的情况下，在分类和分割方面始终优于最先进的SSL方法，突出了其作为3D医学图像分析和临床应用的可扩展、标签高效范例的潜力。

{"title":"Anatomy-Guided Self-Supervised Distillation Learning for Medical Image Analysis.","authors":"Huihui Yu,Qun Dai","doi":"10.1109/tmi.2026.3680920","DOIUrl":"https://doi.org/10.1109/tmi.2026.3680920","url":null,"abstract":"3D medical imaging modalities, including CT and MRI, provide high-resolution views essential for precision medicine. However, the increasing volume and complexity of 3D medical images challenge manual analysis, particularly in classification and segmentation tasks. Although deep learning has shown considerable promise, it struggles to characterize small-scale, low-contrast anatomical structures, generalize across imaging domains, and mitigate annotation scarcity. Self-supervised learning (SSL) has emerged as an effective and annotation-efficient solution, yet existing methods, largely adapted from natural images, often fail to capture the anatomical heterogeneity and complex semantic dependencies inherent in 3D medical data. To address these limitations, we propose AG-SSD (Anatomy-Guided Self-Supervised Distillation), a framework that explicitly incorporates anatomical priors into SSL. AG-SSD comprises three complementary modules: (i) Cross-View Anatomical Consistency (CVAC), which generates multi-scale, anatomically consistent positive pairs via overlap-aware cropping; (ii) Edge-Aware Adaptive Masking (EAAM), which prioritizes anatomy-sensitive, high-edge regions to enhance local feature learning and robust global representation; and (iii) Cross-View Attention Alignment (CVAA), which leverages attention-based fusion to achieve semantic compensation and alignment across views, mitigating semantic drift to stabilize distillation. These modules are optimized using a unified objective that combines intraview patch distillation, inter-view [CLS] token distillation, and masked patch reconstruction. Extensive experiments on CT and MRI datasets demonstrate that AG-SSD consistently outperforms state-of-the-art SSL methods in both classification and segmentation under annotation-scarce scenarios, highlighting its potential as a scalable, label-efficient paradigm for 3D medical image analysis and clinical applications.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"7 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147625745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data-Driven Band Optimization and Frequency-Aware Modeling in Medical Hyperspectral Image Segmentation 医学高光谱图像分割中数据驱动的频带优化和频率感知建模

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-04-02 DOI: 10.1109/tmi.2026.3680239

Wei Li, Geng Qin, Huan Liu, Xueyu Zhang, Yunfei Zhou, Haihao Zhang, Xiang-Gen Xia

引用次数: 0

ConfIC-RCA: Statistically Grounded Efficient Estimation of Segmentation Quality. conic - rca：基于统计的分割质量有效估计。

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-03-31 DOI: 10.1109/tmi.2026.3679618

Matias Cosarinsky,Ramiro Billot,Lucas Mansilla,Gabriel Jimenez,Nicolas Gaggion,Guanghui Fu,Tom Tirer,Enzo Ferrante

Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA.

评估自动图像分割的质量在临床实践中是至关重要的，但由于地面真值注释的可用性有限，通常非常具有挑战性。反向分类精度（RCA）是一种方法，它通过对这些预测训练分割器来估计未见样本的新预测的质量，然后对现有的注释图像进行评估。在这项工作中，我们介绍了Conformal In- context RCA (Conformal In- context RCA)，这是一种在没有真值注释的情况下自动估计具有统计保证的分割质量的新方法，它由两个主要创新组成。首先是上下文RCA，它利用最新的上下文学习模型进行图像分割，并结合检索增强技术来选择最相关的参考图像。这种方法可以用最少的参考数据进行有效的质量估计，同时避免了训练额外模型的需要。其次，我们引入了保形RCA，它扩展了原始RCA框架和上下文RCA，超越了点估计。使用来自分割共形预测的工具，共形RCA生成分割质量的预测区间，提供统计保证，以用户指定的概率保证真实分数位于估计区间内。在不同器官和模式的10种不同医学成像任务中进行了验证，我们的方法显示出强大的性能和计算效率，为临床工作流程中的自动化质量控制提供了有前途的解决方案，其中快速可靠的分割评估是必不可少的。代码可在https://github.com/mcosarinsky/Conformal-In-Context-RCA上获得。

{"title":"ConfIC-RCA: Statistically Grounded Efficient Estimation of Segmentation Quality.","authors":"Matias Cosarinsky,Ramiro Billot,Lucas Mansilla,Gabriel Jimenez,Nicolas Gaggion,Guanghui Fu,Tom Tirer,Enzo Ferrante","doi":"10.1109/tmi.2026.3679618","DOIUrl":"https://doi.org/10.1109/tmi.2026.3679618","url":null,"abstract":"Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"102 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147583998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpretable Similarity of Synthetic Image Utility 合成图像实用程序的可解释相似性

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-03-31 DOI: 10.1109/tmi.2026.3679527

Panagiota Gatoula, George Dimas, Dimitris K. lakovidis

引用次数: 0

ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation coldiff：整合结肠镜视频生成的动态一致性和内容意识

IF 10.6 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging

Pub Date : 2026-03-30 DOI: 10.1109/tmi.2026.3678906

Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang, Kehao Wang, Ke Chen, Shengli Lin, Pinghong Zhou, Zeju Li, Yuanyuan Wang, Yi Guo