首页 > 最新文献

Medical image analysis最新文献

英文 中文
Asymmetric fiber orientation distribution estimation via unsupervised deep learning AsymTrack:通过无监督深度学习对脑部疾病进行精确神经束造影的非对称纤维定向映射
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-29 DOI: 10.1016/j.media.2026.103968
Di Zhang , Ziyu Li , Xiaofeng Deng , Zekun Han , Alan Wang , Yong Liu , Fangrong Zong
Diffusion magnetic resonance imaging (dMRI) tractography is a key technique for reconstructing brain structural connectivity. A widely recognized limitation in tractography is the enforced symmetry of fiber orientation distribution functions (fODFs) in opposite directions, which may impair performance in regions with asymmetric microstructural organization. Previous studies have proposed leveraging anatomical priors or labeled training data to address this limitation; however, such data requirements constrain generalizability. In this study, we propose Recursive-a-fODF, a recursive estimator that uses an unsupervised deep learning framework to directly estimate asymmetric fODFs (a-fODFs) from dMRI data. The model incorporates a recursive calibration process that directly and dynamically estimates the white matter response function from the data itself, eliminating the need for external anatomical priors. We validate the framework using ex vivo marmoset brain data and in vivo human datasets, demonstrating superior performance in resolving complex fiber configurations. When applied to clinical cohorts with neurodegenerative and psychiatric conditions, Recursive-a-fODF reveals disease-specific alterations in fiber orientation asymmetry. These findings demonstrate that a-fODFs, estimated in a purely data-driven manner, can capture microstructural signatures relevant to disease pathology. Collectively, this work establishes a-fODF-based modeling as a powerful, anatomically unbiased approach that provides a complementary dimension to conventional diffusion metrics. These technical advances form a foundation for more accurate tractography and offer a new avenue for developing sensitive neuroimaging biomarkers.
扩散磁共振成像(dMRI)是重建脑结构连通性的一项关键技术。纤维取向分布函数(fODFs)在相反方向上的强制对称性是纤维取向成像的一个广泛认识的局限性,这可能会影响微结构组织不对称区域的性能。先前的研究建议利用解剖学先验或标记训练数据来解决这一限制;然而,这样的数据需求限制了通用性。在本研究中,我们提出递归-a- fodf,这是一种递归估计器,它使用无监督深度学习框架直接从dMRI数据中估计非对称fodf (a- fodf)。该模型结合了一个递归校准过程,可以直接动态地从数据本身估计白质响应函数,从而消除了外部解剖先验的需要。我们使用离体狨猴大脑数据和体内人类数据集验证了该框架,证明了在解决复杂纤维结构方面的优越性能。当应用于神经退行性和精神疾病的临床队列时,递归-a- fodf显示了纤维取向不对称的疾病特异性改变。这些发现表明,以纯粹数据驱动的方式估计的a- fodf可以捕获与疾病病理相关的微观结构特征。总的来说,这项工作建立了基于a- fodf的建模作为一种强大的,解剖学上无偏倚的方法,为传统的扩散指标提供了一个补充维度。这些技术进步为更精确的神经束造影奠定了基础,并为开发敏感的神经成像生物标志物提供了新的途径。
{"title":"Asymmetric fiber orientation distribution estimation via unsupervised deep learning","authors":"Di Zhang ,&nbsp;Ziyu Li ,&nbsp;Xiaofeng Deng ,&nbsp;Zekun Han ,&nbsp;Alan Wang ,&nbsp;Yong Liu ,&nbsp;Fangrong Zong","doi":"10.1016/j.media.2026.103968","DOIUrl":"10.1016/j.media.2026.103968","url":null,"abstract":"<div><div>Diffusion magnetic resonance imaging (dMRI) tractography is a key technique for reconstructing brain structural connectivity. A widely recognized limitation in tractography is the enforced symmetry of fiber orientation distribution functions (fODFs) in opposite directions, which may impair performance in regions with asymmetric microstructural organization. Previous studies have proposed leveraging anatomical priors or labeled training data to address this limitation; however, such data requirements constrain generalizability. In this study, we propose Recursive-a-fODF, a recursive estimator that uses an unsupervised deep learning framework to directly estimate asymmetric fODFs (a-fODFs) from dMRI data. The model incorporates a recursive calibration process that directly and dynamically estimates the white matter response function from the data itself, eliminating the need for external anatomical priors. We validate the framework using ex vivo marmoset brain data and in vivo human datasets, demonstrating superior performance in resolving complex fiber configurations. When applied to clinical cohorts with neurodegenerative and psychiatric conditions, Recursive-a-fODF reveals disease-specific alterations in fiber orientation asymmetry. These findings demonstrate that a-fODFs, estimated in a purely data-driven manner, can capture microstructural signatures relevant to disease pathology. Collectively, this work establishes a-fODF-based modeling as a powerful, anatomically unbiased approach that provides a complementary dimension to conventional diffusion metrics. These technical advances form a foundation for more accurate tractography and offer a new avenue for developing sensitive neuroimaging biomarkers.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103968"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MADAT: Missing-aware dynamic adaptive transformer model for medical prognosis prediction with incomplete multimodal data 不完全多模态数据下医疗预后预测的缺失感知动态自适应变压器模型
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-24 DOI: 10.1016/j.media.2026.103958
Jianbin He , Guoheng Huang , Xiaochen Yuan , Chi-Man Pun , Guo Zhong , Qi Yang , Ling Guo , Siyu Zhu , Baiying Lei , Haojiang Li
Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.
多模式医学预后预测通过整合多种数据类型,在提高诊断准确性方面显示出巨大的潜力。然而,不完整的多模态,其中某些模态缺失,对模型性能提出了重大挑战。现有的动态自适应和模态补全方法在综合处理不完全多模态方面存在局限性。动态适应方法只处理可用的模态,不能充分利用模态相互作用。模态补全方法处理的是多模态关系,但有产生不可靠数据的风险,特别是当关键模态缺失时,因为现有模态无法复制缺失模态的独特特征。这损害了融合质量并降低了模型性能。为了解决这些挑战,我们提出了缺失感知动态自适应变压器(MADAT)模型,该模型集成了两个阶段:解耦推广完成阶段(DGCP)和自适应交叉融合阶段(ACFP)。DGCP通过使用渐进式变换递归门控卷积(PTRGC)和小波对齐域泛化(WADG)生成模态间和模态内共享信息来重建缺失模态。ACFP结合了跨代理注意(CAA)和发电质量反馈调节(GQFR),自适应地融合了原始和生成的模态特征。CAA确保特征的彻底集成和对齐,而GQFR根据生成的特征的质量动态调整模型对其的依赖,防止对低质量数据的过度依赖。在三个私人鼻咽癌数据集上的实验表明,MADAT优于现有方法,在不完全多模态条件下的医学多模态预测具有优越的鲁棒性。
{"title":"MADAT: Missing-aware dynamic adaptive transformer model for medical prognosis prediction with incomplete multimodal data","authors":"Jianbin He ,&nbsp;Guoheng Huang ,&nbsp;Xiaochen Yuan ,&nbsp;Chi-Man Pun ,&nbsp;Guo Zhong ,&nbsp;Qi Yang ,&nbsp;Ling Guo ,&nbsp;Siyu Zhu ,&nbsp;Baiying Lei ,&nbsp;Haojiang Li","doi":"10.1016/j.media.2026.103958","DOIUrl":"10.1016/j.media.2026.103958","url":null,"abstract":"<div><div>Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103958"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision foundation model for 3D magnetic resonance imaging segmentation, classification, and registration 视觉基础模型用于三维磁共振成像的分割、分类和配准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-12 DOI: 10.1016/j.media.2026.103992
Shansong Wang, Mojtaba Safari, Qiang Li, Chih-Wei Chang, Richard LJ Qiu, Justin Roper, David S. Yu, Xiaofeng Yang
Vision foundation models (VFMs) are pre-trained on extensive image datasets to learn general representations. These models can subsequently be fine-tuned for specific downstream tasks, markedly boosting performance across a broad range of applications. However, existing vision foundation models that claim to be applicable to various downstream tasks are mostly pre-trained on imaging modalities with different characteristics than magnetic resonance imaging (MRI), those differences in imaging principles, signal characteristics, and data distribution may hinder their practical performance and versatility in MRI-specific applications. Here, we propose Triad, a vision foundation model for 3D MRI segmentation, classification, and registration. Triad learns robust representations from 129K 3D MRI volumes based on SimMIM framework and uses textual descriptions related to modality, device parameters, and imaging parameters to constrain the semantic distribution of the visual modality. The above pre-training dataset is called Triad-129K, which is currently the largest 3D MRI pre-training dataset. We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration, in two data modalities (within-domain and out-of-domain) settings using 25 downstream datasets. By initializing models with Triad’s pre-trained weights, nnUNet-Triad-SimMIM improves segmentation performance by 2.13% compared to nnUNet-Scratch across 17 datasets. Swin-B-Triad-SimMIM achieves a 4.38% improvement over Swin-B-Scratch in classification tasks across five datasets. SwinUNETR-Triad-SimMIM improves by 3.84% compared to SwinUNETR-Scratch in registration tasks across two datasets. Our study demonstrates that pre-training can improve performance when the data modalities and organs of upstream and downstream tasks are consistent. This work highlights the value of large-scale pre-training techniques for downstream tasks in 3D MRI.
视觉基础模型(VFMs)在广泛的图像数据集上进行预训练以学习一般表示。这些模型随后可以针对特定的下游任务进行微调,从而显著提高各种应用程序的性能。然而,现有的声称适用于各种下游任务的视觉基础模型大多是针对与磁共振成像(MRI)不同特征的成像模式进行预训练的,这些成像原理、信号特征和数据分布的差异可能会阻碍其在MRI特定应用中的实际性能和通用性。在此,我们提出Triad,一个用于3D MRI分割、分类和配准的视觉基础模型。Triad基于SimMIM框架从129K 3D MRI体中学习鲁棒表示,并使用与模态、设备参数和成像参数相关的文本描述来约束视觉模态的语义分布。上述预训练数据集称为Triad-129K,是目前最大的3D MRI预训练数据集。我们在使用25个下游数据集的两种数据模式(域内和域外)设置中评估了Triad在三个任务中的应用,即器官/肿瘤分割、器官/癌症分类和医学图像配准。通过使用Triad预训练的权重初始化模型,nnUNet-Triad-SimMIM在17个数据集上的分割性能比nnUNet-Scratch提高了2.13%。在跨5个数据集的分类任务中,swing - b - triad - simmim比swing - b - scratch提高了4.38%。与swinunetri - scratch相比,swinunetri - triad - simmim在跨两个数据集的注册任务中提高了3.84%。我们的研究表明,当上游和下游任务的数据模式和器官一致时,预训练可以提高性能。这项工作强调了3D MRI下游任务的大规模预训练技术的价值。
{"title":"Vision foundation model for 3D magnetic resonance imaging segmentation, classification, and registration","authors":"Shansong Wang,&nbsp;Mojtaba Safari,&nbsp;Qiang Li,&nbsp;Chih-Wei Chang,&nbsp;Richard LJ Qiu,&nbsp;Justin Roper,&nbsp;David S. Yu,&nbsp;Xiaofeng Yang","doi":"10.1016/j.media.2026.103992","DOIUrl":"10.1016/j.media.2026.103992","url":null,"abstract":"<div><div>Vision foundation models (VFMs) are pre-trained on extensive image datasets to learn general representations. These models can subsequently be fine-tuned for specific downstream tasks, markedly boosting performance across a broad range of applications. However, existing vision foundation models that claim to be applicable to various downstream tasks are mostly pre-trained on imaging modalities with different characteristics than magnetic resonance imaging (MRI), those differences in imaging principles, signal characteristics, and data distribution may hinder their practical performance and versatility in MRI-specific applications. Here, we propose <strong>Triad</strong>, a vision foundation model for 3D MRI segmentation, classification, and registration. Triad learns robust representations from 129K 3D MRI volumes based on SimMIM framework and uses textual descriptions related to modality, device parameters, and imaging parameters to constrain the semantic distribution of the visual modality. The above pre-training dataset is called Triad-129K, which is currently the largest 3D MRI pre-training dataset. We evaluate Triad across three tasks, namely, organ/tumor segmentation, organ/cancer classification, and medical image registration, in two data modalities (within-domain and out-of-domain) settings using 25 downstream datasets. By initializing models with Triad’s pre-trained weights, nnUNet-Triad-SimMIM improves segmentation performance by 2.13% compared to nnUNet-Scratch across 17 datasets. Swin-B-Triad-SimMIM achieves a 4.38% improvement over Swin-B-Scratch in classification tasks across five datasets. SwinUNETR-Triad-SimMIM improves by 3.84% compared to SwinUNETR-Scratch in registration tasks across two datasets. Our study demonstrates that pre-training can improve performance when the data modalities and organs of upstream and downstream tasks are consistent. This work highlights the value of large-scale pre-training techniques for downstream tasks in 3D MRI.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103992"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146209649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised domain adaptation for medical image segmentation using adaptogen-perturbation 基于自适应原摄动的医学图像分割无监督域自适应
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-19 DOI: 10.1016/j.media.2026.104002
Hong Joo Lee , Yuan Bi , Sangmin Lee , Gyeong-Moon Park , Jung Uk Kim , Seong Tae Kim , Zhongliang Jiang , Nassir Navab
Domains shift originated from differences in devices or patients in the medical field, poses a significant challenge when applying pre-trained models to clinical applications. To tackle this challenge, domain adaptation methods have been explored. However, most existing methods are designed for a single target domain adaptation or require sharing all target domain data for adaptation, which is infeasible in the medical field due to privacy issues. In this paper, we propose a novel unsupervised multi-target domain adaptation method without requiring data sharing. To this end, we introduce an additional signal, termed Adaptogen-Perturbation (AP) optimized to bridge the gap between the source and target domains. The optimized AP is injected into the latent feature and facilitates the adaptation of the pre-trained model to the target domain. Moreover, we propose a Spectral/Geometric Consistency learning framework to optimize the AP in an unsupervised manner. This promotes consistent predictions across two types of transformations: geometric and frequency-space spectral transformations, enhancing robustness to both variations. Extensive experiments with multiple medical segmentation datasets demonstrate the effectiveness of APs.
领域转移源于医疗领域中设备或患者的差异,在将预训练模型应用于临床应用时提出了重大挑战。为了应对这一挑战,研究人员探索了领域自适应方法。然而,现有的大多数方法都是针对单一目标域的自适应设计的,或者需要共享所有目标域的数据进行自适应,这在医疗领域由于隐私问题是不可行的。本文提出了一种无需数据共享的无监督多目标域自适应方法。为此,我们引入了一个额外的信号,称为自适应原摄动(AP),优化以弥合源和目标域之间的差距。优化后的AP被注入到潜在特征中,便于预训练模型适应目标域。此外,我们提出了一个谱/几何一致性学习框架,以无监督的方式优化AP。这促进了两种类型变换的一致预测:几何变换和频率空间谱变换,增强了对两种变化的鲁棒性。在多个医学分割数据集上进行的大量实验证明了ap的有效性。
{"title":"Unsupervised domain adaptation for medical image segmentation using adaptogen-perturbation","authors":"Hong Joo Lee ,&nbsp;Yuan Bi ,&nbsp;Sangmin Lee ,&nbsp;Gyeong-Moon Park ,&nbsp;Jung Uk Kim ,&nbsp;Seong Tae Kim ,&nbsp;Zhongliang Jiang ,&nbsp;Nassir Navab","doi":"10.1016/j.media.2026.104002","DOIUrl":"10.1016/j.media.2026.104002","url":null,"abstract":"<div><div>Domains shift originated from differences in devices or patients in the medical field, poses a significant challenge when applying pre-trained models to clinical applications. To tackle this challenge, domain adaptation methods have been explored. However, most existing methods are designed for a single target domain adaptation or require sharing all target domain data for adaptation, which is infeasible in the medical field due to privacy issues. In this paper, we propose a novel unsupervised multi-target domain adaptation method without requiring data sharing. To this end, we introduce an additional signal, termed Adaptogen-Perturbation (AP) optimized to bridge the gap between the source and target domains. The optimized AP is injected into the latent feature and facilitates the adaptation of the pre-trained model to the target domain. Moreover, we propose a Spectral/Geometric Consistency learning framework to optimize the AP in an unsupervised manner. This promotes consistent predictions across two types of transformations: geometric and frequency-space spectral transformations, enhancing robustness to both variations. Extensive experiments with multiple medical segmentation datasets demonstrate the effectiveness of APs.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 104002"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146778369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-enriched contrastive auto-encoder with topology learning for medical hyperspectral image classification to diagnose tumors 基于拓扑学习的上下文丰富对比自编码器用于医学高光谱图像分类诊断肿瘤
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-16 DOI: 10.1016/j.media.2026.103998
Meiling Wang , Changda Xing , Yifang Wu , Cheng Wang
Deep learning has emerged as a highly effective approach for the automatic classification of medical hyperspectral images (MedHSIs), facilitating the accurate diagnosis of diverse tumors. Most of current methods suffer from the challenges in the separability of latent representation and the preservation of topological structure. To remedy such deficiencies, we propose a novel context-enriched contrastive auto-encoder with topology learning (CCAET) based MedHSI classification for tumor diagnosis. Concretely, we initially employ an auto-encoder (AE) network as the foundational framework to achieve latent representation and data reconstruction. Subsequently, to enhance the separability of the latent representation, we integrate a context-enriched contrastive loss into the AE framework, where both individual pixels and their contextual information are used to redefine contrastive learning, so that small distances between intra-class features and large distances between inter-class features are achieved. Further, we construct a topology-preserved loss by incorporating a graph based approach into the foundational framework, which aims to maintain the topological structure during latent representation learning. Furthermore, we devise a gradient descent-based optimization scheme to optimize the overall loss of the proposed CCAET method, so as to obtain the final latent representation of MedHSI. Finally, we implement soft-max-based label prediction to obtain the classification results. Different from most existing methods, the CCAET approach considers both the separability of latent representations and the preservation of topological structure to enhance MedHSI classification performance. Extensive experimental evidence has been presented to substantiate that the proposed CCAET method outperforms several state-of-the-art techniques in terms of MedHSI classification based tumor diagnosis.
深度学习已经成为医学高光谱图像(MedHSIs)自动分类的一种非常有效的方法,有助于准确诊断各种肿瘤。现有的方法大多存在潜在表示的可分离性和拓扑结构的保密性等问题。为了弥补这些缺陷,我们提出了一种基于拓扑学习(CCAET)的MedHSI分类的新型上下文丰富对比自编码器用于肿瘤诊断。具体而言,我们首先采用自编码器(AE)网络作为基础框架来实现潜在表示和数据重建。随后,为了增强潜在表征的可分离性,我们将上下文丰富的对比损失集成到AE框架中,其中使用单个像素及其上下文信息来重新定义对比学习,从而实现类内特征之间的小距离和类间特征之间的大距离。此外,我们通过将基于图的方法结合到基本框架中来构建拓扑保留损失,该框架旨在在潜在表示学习期间保持拓扑结构。此外,我们设计了一种基于梯度下降的优化方案来优化所提出的CCAET方法的总体损失,从而获得MedHSI的最终潜在表示。最后,实现基于soft-max的标签预测,得到分类结果。与大多数现有方法不同的是,CCAET方法同时考虑了潜在表征的可分离性和拓扑结构的保留,以提高MedHSI分类性能。大量的实验证据已经提出,以证实所提出的CCAET方法优于几种最先进的技术,在MedHSI分类为基础的肿瘤诊断。
{"title":"Context-enriched contrastive auto-encoder with topology learning for medical hyperspectral image classification to diagnose tumors","authors":"Meiling Wang ,&nbsp;Changda Xing ,&nbsp;Yifang Wu ,&nbsp;Cheng Wang","doi":"10.1016/j.media.2026.103998","DOIUrl":"10.1016/j.media.2026.103998","url":null,"abstract":"<div><div>Deep learning has emerged as a highly effective approach for the automatic classification of medical hyperspectral images (MedHSIs), facilitating the accurate diagnosis of diverse tumors. Most of current methods suffer from the challenges in the separability of latent representation and the preservation of topological structure. To remedy such deficiencies, we propose a novel context-enriched contrastive auto-encoder with topology learning (CCAET) based MedHSI classification for tumor diagnosis. Concretely, we initially employ an auto-encoder (AE) network as the foundational framework to achieve latent representation and data reconstruction. Subsequently, to enhance the separability of the latent representation, we integrate a context-enriched contrastive loss into the AE framework, where both individual pixels and their contextual information are used to redefine contrastive learning, so that small distances between intra-class features and large distances between inter-class features are achieved. Further, we construct a topology-preserved loss by incorporating a graph based approach into the foundational framework, which aims to maintain the topological structure during latent representation learning. Furthermore, we devise a gradient descent-based optimization scheme to optimize the overall loss of the proposed CCAET method, so as to obtain the final latent representation of MedHSI. Finally, we implement soft-max-based label prediction to obtain the classification results. Different from most existing methods, the CCAET approach considers both the separability of latent representations and the preservation of topological structure to enhance MedHSI classification performance. Extensive experimental evidence has been presented to substantiate that the proposed CCAET method outperforms several state-of-the-art techniques in terms of MedHSI classification based tumor diagnosis.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103998"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146209949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing uncertainty assessment in dynamic PET imaging with residual permutation and clustering 残差排列聚类增强动态PET成像不确定度评估
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-16 DOI: 10.1016/j.media.2026.103997
Kun Ma , Fangxiao Cheng , Wei Liu , Wenrui Shao , Yalei Yang , Nan Li , Xiangxi Meng , Zhaoheng Xie
Quantitative positron emission tomography (PET) is widely used for disease diagnosis and therapy monitoring, yet the reliability of kinetic parameters depends on robust uncertainty quantification. Existing Bayesian methods are computationally demanding, bootstrap approaches are noise-sensitive, and recent deep learning models often require large training datasets and lack physical scale sensitivity. To address these limitations, we propose a clustering-based residual permutation (RP) framework for uncertainty estimation in dynamic PET. The method generates pseudo time-activity curves (TACs) by permuting fitting residuals within kinetically homogeneous clusters, preserving spatiotemporal noise characteristics while avoiding noise misallocation across heterogeneous regions. To ensure meaningful residual construction, we introduce a regularized model with Huber loss and elastic-net regularization, improving numerical stability and preventing overfitting. Extensive validation on simulated data (TACs and XCAT-OSEM reconstructions) and clinical total-body PET demonstrates that RP yields uncertainty estimates that scale consistently with noise level and preserve expected physical disparities between kinetic parameters. Compared with reference baselines, the proposed framework provides a distribution-free, training-independent, and computationally efficient solution for voxel-wise uncertainty quantification. Overall, RP fills a practical gap between expensive Bayesian inference and data-hungry deep learning, offering a robust and clinically deployable approach to uncertainty-aware dynamic PET analysis. The code of the proposed RP method and reference methods is available at: https://github.com/ANMMILab-PKU.
定量正电子发射断层扫描(PET)广泛应用于疾病诊断和治疗监测,但动力学参数的可靠性依赖于强大的不确定性量化。现有的贝叶斯方法对计算量要求很高,自举方法对噪声敏感,而最近的深度学习模型通常需要大型训练数据集,缺乏物理尺度敏感性。为了解决这些限制,我们提出了一个基于聚类的残差置换(RP)框架,用于动态PET的不确定性估计。该方法通过对动态均匀簇内的拟合残差进行置换生成伪时间活动曲线(TACs),既保留了时空噪声特征,又避免了异质性区域间的噪声错配。为了保证残差构造有意义,我们引入了带Huber损失和弹性网正则化的正则化模型,提高了数值稳定性,防止了过拟合。对模拟数据(tac和XCAT-OSEM重建)和临床全身PET的广泛验证表明,RP产生的不确定性估计与噪声水平一致,并保留了动力学参数之间预期的物理差异。与参考基线相比,该框架为体素不确定性量化提供了一种无分布、训练无关且计算效率高的解决方案。总的来说,RP填补了昂贵的贝叶斯推理和数据饥渴的深度学习之间的实际空白,为不确定性感知的动态PET分析提供了一种强大的、临床可部署的方法。建议的RP方法和参考方法的代码可在:https://github.com/CullenMa/RP。
{"title":"Enhancing uncertainty assessment in dynamic PET imaging with residual permutation and clustering","authors":"Kun Ma ,&nbsp;Fangxiao Cheng ,&nbsp;Wei Liu ,&nbsp;Wenrui Shao ,&nbsp;Yalei Yang ,&nbsp;Nan Li ,&nbsp;Xiangxi Meng ,&nbsp;Zhaoheng Xie","doi":"10.1016/j.media.2026.103997","DOIUrl":"10.1016/j.media.2026.103997","url":null,"abstract":"<div><div>Quantitative positron emission tomography (PET) is widely used for disease diagnosis and therapy monitoring, yet the reliability of kinetic parameters depends on robust uncertainty quantification. Existing Bayesian methods are computationally demanding, bootstrap approaches are noise-sensitive, and recent deep learning models often require large training datasets and lack physical scale sensitivity. To address these limitations, we propose a clustering-based residual permutation (RP) framework for uncertainty estimation in dynamic PET. The method generates pseudo time-activity curves (TACs) by permuting fitting residuals within kinetically homogeneous clusters, preserving spatiotemporal noise characteristics while avoiding noise misallocation across heterogeneous regions. To ensure meaningful residual construction, we introduce a regularized model with Huber loss and elastic-net regularization, improving numerical stability and preventing overfitting. Extensive validation on simulated data (TACs and XCAT-OSEM reconstructions) and clinical total-body PET demonstrates that RP yields uncertainty estimates that scale consistently with noise level and preserve expected physical disparities between kinetic parameters. Compared with reference baselines, the proposed framework provides a distribution-free, training-independent, and computationally efficient solution for voxel-wise uncertainty quantification. Overall, RP fills a practical gap between expensive Bayesian inference and data-hungry deep learning, offering a robust and clinically deployable approach to uncertainty-aware dynamic PET analysis. The code of the proposed RP method and reference methods is available at: <span><span>https://github.com/ANMMILab-PKU</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103997"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146209646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artifact-suppressed 3D retinal microvascular segmentation via multi-scale topology regulation 基于多尺度拓扑调节的伪影抑制视网膜三维微血管分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-10 DOI: 10.1016/j.media.2026.103988
Ting Luo , Jinxian Zhang , Tao Chen , Zhouyan He , Yanda Meng , Mengting Liu , Jiong Zhang , Dan Zhang
Optical coherence tomography angiography (OCTA) enables non-invasive visualization of retinal microvasculature, and accurate 3D vessel segmentation is essential for quantifying biomarkers critical for early diagnosis and monitoring of diabetic retinopathy. However, reliable 3D OCTA segmentation is hindered by capillary invisibility, complex vascular topology, and motion artifacts, which compromise biomarker accuracy. Furthermore, the scarcity of manually annotated 3D OCTA microvascular data constrains methodological development. To address this challenge, we introduce our publicly accessible 3D microvascular dataset and propose MT-Net, a multi-view, topology-aware 3D retinal microvascular segmentation network. First, a novel dimension transformation strategy is employed to enhance topological accuracy by effectively encoding spatial dependencies across multiple planes. Second, to mitigate the impact of motion artifacts, we introduce a unidirectional Artifact Suppression Module (ASM) that selectively suppresses noise along the B-scan direction. Third, a Twin-Cross Attention Module (TCAM), guided by vessel centerlines, is designed to enhance the continuity and completeness of segmented vessels by reinforcing cross-view contextual information. Experiments on two 3D OCTA datasets show that MT-Net achieves state-of-the-art accuracy and topological consistency, with strong generalizability validated by cross-dataset analysis. We plan to release our manual annotations to facilitate future research in retinal OCTA segmentation.
光学相干断层扫描血管造影(OCTA)可以实现视网膜微血管的无创可视化,准确的3D血管分割对于量化糖尿病视网膜病变早期诊断和监测的生物标志物至关重要。然而,可靠的3D OCTA分割受到毛细管不可见性、复杂的血管拓扑和运动伪影的阻碍,这些都会影响生物标志物的准确性。此外,手工注释的3D OCTA微血管数据的缺乏限制了方法的发展。为了解决这一挑战,我们引入了可公开访问的3D微血管数据集,并提出了MT-Net,一个多视图、拓扑感知的3D视网膜微血管分割网络。首先,采用一种新颖的维变换策略,通过对多平面的空间依赖进行有效编码来提高拓扑精度。其次,为了减轻运动伪影的影响,我们引入了一个单向伪影抑制模块(ASM),它可以选择性地抑制沿b扫描方向的噪声。第三,设计以血管中心线为导向的双交叉注意模块(TCAM),通过强化交叉视图上下文信息来增强分割血管的连续性和完整性。在两个三维OCTA数据集上进行的实验表明,MT-Net达到了最先进的精度和拓扑一致性,并通过跨数据集分析验证了其强大的泛化能力。我们计划发布我们的手工注释,以促进视网膜OCTA分割的未来研究。
{"title":"Artifact-suppressed 3D retinal microvascular segmentation via multi-scale topology regulation","authors":"Ting Luo ,&nbsp;Jinxian Zhang ,&nbsp;Tao Chen ,&nbsp;Zhouyan He ,&nbsp;Yanda Meng ,&nbsp;Mengting Liu ,&nbsp;Jiong Zhang ,&nbsp;Dan Zhang","doi":"10.1016/j.media.2026.103988","DOIUrl":"10.1016/j.media.2026.103988","url":null,"abstract":"<div><div>Optical coherence tomography angiography (OCTA) enables non-invasive visualization of retinal microvasculature, and accurate 3D vessel segmentation is essential for quantifying biomarkers critical for early diagnosis and monitoring of diabetic retinopathy. However, reliable 3D OCTA segmentation is hindered by capillary invisibility, complex vascular topology, and motion artifacts, which compromise biomarker accuracy. Furthermore, the scarcity of manually annotated 3D OCTA microvascular data constrains methodological development. To address this challenge, we introduce our publicly accessible 3D microvascular dataset and propose MT-Net, a multi-view, topology-aware 3D retinal microvascular segmentation network. First, a novel dimension transformation strategy is employed to enhance topological accuracy by effectively encoding spatial dependencies across multiple planes. Second, to mitigate the impact of motion artifacts, we introduce a unidirectional Artifact Suppression Module (ASM) that selectively suppresses noise along the B-scan direction. Third, a Twin-Cross Attention Module (TCAM), guided by vessel centerlines, is designed to enhance the continuity and completeness of segmented vessels by reinforcing cross-view contextual information. Experiments on two 3D OCTA datasets show that MT-Net achieves state-of-the-art accuracy and topological consistency, with strong generalizability validated by cross-dataset analysis. We plan to release our manual annotations to facilitate future research in retinal OCTA segmentation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103988"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SurgLaVi: Large-scale hierarchical dataset for surgical vision-language representation learning SurgLaVi:外科视觉语言表示学习的大规模分层数据集
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-11 DOI: 10.1016/j.media.2026.103982
Alejandra Perez , Chinedu Nwoye , Ramtin Raji Kermani , Omid Mohareri , Muhammad Abdullah Jamal
Vision-language pre-training (VLP) offers unique advantages for surgery by aligning language with surgical videos, enabling workflow understanding and transfer across tasks without relying on expert-labeled datasets. However, progress in surgical VLP remains constrained by the limited scale, procedural diversity, semantic quality, and hierarchical structure of existing datasets. In this work, we present SurgLaVi, the largest and most diverse surgical vision-language dataset to date, comprising nearly 240k clip-caption pairs from more than 200 procedures, and featuring hierarchical levels at coarse-, mid-, and fine-level. At the core of SurgLaVi lies a fully automated pipeline that systematically generates fine-grained transcriptions of surgical videos and segments them into coherent procedural units. To ensure high-quality annotations, it applies dual-modality filtering to remove irrelevant and noisy samples. Within this framework, the resulting captions are enriched with contextual detail, producing annotations that are both semantically rich and easy to interpret. To ensure accessibility, we release SurgLaVi-β, an open-source derivative of 113k clip-caption pairs constructed entirely from public data, which is over four times larger than existing surgical VLP datasets. To demonstrate the value of the SurgLaVi datasets, we introduce SurgCLIP, a CLIP-style video-text contrastive framework with dual encoders, as a representative base model. SurgCLIP achieves consistent improvements across phase, step, action, and tool recognition, surpassing prior state-of-the-art methods, often by large margins. These results validate that large-scale, semantically rich, and hierarchically structured datasets directly translate into stronger and more generalizable representations, establishing SurgLaVi as a key resource for developing surgical foundation models.
视觉语言预训练(VLP)通过将语言与手术视频对齐,为手术提供了独特的优势,使工作流程理解和跨任务传输无需依赖专家标记的数据集。然而,外科VLP的进展仍然受到现有数据集的有限规模、程序多样性、语义质量和层次结构的限制。在这项工作中,我们提出了SurgLaVi,这是迄今为止最大和最多样化的外科视觉语言数据集,包括来自200多个手术的近24万对剪辑标题,并具有粗、中、细层次的分层。SurgLaVi的核心是一个全自动的流水线,系统地生成细粒度的手术视频转录,并将其分割成连贯的程序单元。为了保证高质量的注释,它采用双模态滤波来去除不相关和有噪声的样本。在这个框架中,生成的标题使用上下文细节进行了充实,从而生成语义丰富且易于解释的注释。为了确保可访问性,我们发布了SurgLaVi-β,这是一个完全由公共数据构建的113k剪辑标题对的开源衍生品,比现有的外科VLP数据集大四倍以上。为了证明SurgLaVi数据集的价值,我们引入了一个具有双编码器的clip风格的视频文本对比框架,作为代表性的基础模型。SurgCLIP在阶段、步骤、动作和工具识别方面实现了持续的改进,超越了之前最先进的方法,通常是大幅度的改进。这些结果验证了大规模、语义丰富、分层结构的数据集可以直接转化为更强、更一般化的表示,从而使SurgLaVi成为开发外科基础模型的关键资源。
{"title":"SurgLaVi: Large-scale hierarchical dataset for surgical vision-language representation learning","authors":"Alejandra Perez ,&nbsp;Chinedu Nwoye ,&nbsp;Ramtin Raji Kermani ,&nbsp;Omid Mohareri ,&nbsp;Muhammad Abdullah Jamal","doi":"10.1016/j.media.2026.103982","DOIUrl":"10.1016/j.media.2026.103982","url":null,"abstract":"<div><div>Vision-language pre-training (VLP) offers unique advantages for surgery by aligning language with surgical videos, enabling workflow understanding and transfer across tasks without relying on expert-labeled datasets. However, progress in surgical VLP remains constrained by the limited scale, procedural diversity, semantic quality, and hierarchical structure of existing datasets. In this work, we present SurgLaVi, the largest and most diverse surgical vision-language dataset to date, comprising nearly 240k clip-caption pairs from more than 200 procedures, and featuring hierarchical levels at coarse-, mid-, and fine-level. At the core of SurgLaVi lies a fully automated pipeline that systematically generates fine-grained transcriptions of surgical videos and segments them into coherent procedural units. To ensure high-quality annotations, it applies dual-modality filtering to remove irrelevant and noisy samples. Within this framework, the resulting captions are enriched with contextual detail, producing annotations that are both semantically rich and easy to interpret. To ensure accessibility, we release <span><span>SurgLaVi-β</span><svg><path></path></svg></span>, an open-source derivative of 113k clip-caption pairs constructed entirely from public data, which is over four times larger than existing surgical VLP datasets. To demonstrate the value of the SurgLaVi datasets, we introduce SurgCLIP, a CLIP-style video-text contrastive framework with dual encoders, as a representative base model. SurgCLIP achieves consistent improvements across phase, step, action, and tool recognition, surpassing prior state-of-the-art methods, often by large margins. These results validate that large-scale, semantically rich, and hierarchically structured datasets directly translate into stronger and more generalizable representations, establishing SurgLaVi as a key resource for developing surgical foundation models.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103982"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality-label-free fetal brain MRI quality control based on image orientation recognition uncertainty 基于图像方向识别不确定性的无质量标签胎儿脑MRI质量控制
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-10 DOI: 10.1016/j.media.2026.103994
Mingxuan Liu , Yi Liao , Haoxiang Li , Juncheng Zhu , Hongjia Yang , Yingqi Hao , Haibo Qu , Qiyuan Tian
Quality control (QC) in fetal MRI is essential for efficient, high-quality data acquisition and analysis aimed at assessing fetal brain development and detecting abnormalities. Supervised deep learning methods require numerous image-quality labels and have limited generalization to cross-domain out-of-distribution data. To address these problems, an Orientation Recognition Kolmogorov-Arnold Network (OR-KAN), composed of stacked Bottleneck KAN Convolution layers, was trained on turbo spin echo (TSE) T2-weighted data augmented from seven fetal brain atlases to predict stack orientation (i.e., axial, sagittal, and coronal). Image quality was quantified as the entropy of the prediction uncertainty. Experiments showed that OR-KAN achieved an area under the receiver operating characteristic curve (AUROC) of 0.840 and an area under the precision recall curve (AUPR) of 0.954 on the TSE data. Its performance on balanced turbo field echo (BTFE) data, which exhibit a distinct T2-weighted contrast, did not degrade (AUROC: 0.840 to 0.881; AUPR: 0.954 to 0.857). Moreover, bagging OR-KAN with models pre-trained on quality labels delivered the best results on both datasets, outperforming the state-of-the-art (SOTA) supervised method FetMRQC by 14.1 % on TSE (AUROC: 0.828 vs. 0.945) and by 21.9 % on BTFE (AUROC: 0.763 vs. 0.930). By selecting the highest-quality stack in each orientation, OR-KAN improved brain volume reconstruction success rates (with NiftyMIC) by 29.2 % (from 64.4 % to 83.2 %) and by 42.5 % (from 60.0 % to 85.5 %) for normal and abnormal fetuses, respectively. We anticipate its immediate utility in improving the diagnosis of fetal brain abnormalities and advancing the study of human brain development in utero across a wide range of clinical and neuroscientific applications. Code is available at: https://github.com/birthlab/OR-KAN.
胎儿MRI的质量控制(QC)对于有效、高质量的数据采集和分析至关重要,旨在评估胎儿大脑发育和检测异常。监督深度学习方法需要大量的图像质量标签,并且对跨域分布外数据的泛化有限。为了解决这些问题,利用七个胎儿脑图谱增强的涡轮自旋回波(TSE) t2加权数据,对由堆叠瓶颈KAN卷积层组成的方向识别Kolmogorov-Arnold网络(OR-KAN)进行训练,预测堆栈方向(即轴向、矢状和冠状)。图像质量被量化为预测不确定性的熵。实验表明,OR-KAN在TSE数据上实现了接收者工作特征曲线下面积(AUROC)为0.840,精确召回曲线下面积(AUPR)为0.954。其在平衡涡轮场回波(BTFE)数据上的性能没有下降,表现出明显的t2加权对比(AUROC: 0.840 ~ 0.881; AUPR: 0.954 ~ 0.857)。此外,在质量标签上预先训练模型的装袋OR-KAN在两个数据集上都取得了最好的结果,在TSE (AUROC: 0.828 vs. 0.945)和BTFE (AUROC: 0.763 vs. 0.930)上的表现分别优于最先进(SOTA)监督方法FetMRQC 14.1%和21.9%。通过在每个方向上选择最高质量的堆栈,OR-KAN将正常胎儿和异常胎儿的脑容量重建成功率(使用NiftyMIC)分别提高了29.2%(从64.4%提高到83.2%)和42.5%(从60.0%提高到85.5%)。我们期待它在改善胎儿大脑异常的诊断和推进人类大脑在子宫内发育的研究在广泛的临床和神经科学应用方面的直接效用。代码可从https://github.com/birthlab/OR-KAN获得。
{"title":"Quality-label-free fetal brain MRI quality control based on image orientation recognition uncertainty","authors":"Mingxuan Liu ,&nbsp;Yi Liao ,&nbsp;Haoxiang Li ,&nbsp;Juncheng Zhu ,&nbsp;Hongjia Yang ,&nbsp;Yingqi Hao ,&nbsp;Haibo Qu ,&nbsp;Qiyuan Tian","doi":"10.1016/j.media.2026.103994","DOIUrl":"10.1016/j.media.2026.103994","url":null,"abstract":"<div><div>Quality control (QC) in fetal MRI is essential for efficient, high-quality data acquisition and analysis aimed at assessing fetal brain development and detecting abnormalities. Supervised deep learning methods require numerous image-quality labels and have limited generalization to cross-domain out-of-distribution data. To address these problems, an Orientation Recognition Kolmogorov-Arnold Network (OR-KAN), composed of stacked Bottleneck KAN Convolution layers, was trained on turbo spin echo (TSE) T<sub>2</sub>-weighted data augmented from seven fetal brain atlases to predict stack orientation (i.e., axial, sagittal, and coronal). Image quality was quantified as the entropy of the prediction uncertainty. Experiments showed that OR-KAN achieved an area under the receiver operating characteristic curve (AUROC) of 0.840 and an area under the precision recall curve (AUPR) of 0.954 on the TSE data. Its performance on balanced turbo field echo (BTFE) data, which exhibit a distinct T<sub>2</sub>-weighted contrast, did not degrade (AUROC: 0.840 to 0.881; AUPR: 0.954 to 0.857). Moreover, bagging OR-KAN with models pre-trained on quality labels delivered the best results on both datasets, outperforming the state-of-the-art (SOTA) supervised method FetMRQC by 14.1 % on TSE (AUROC: 0.828 vs. 0.945) and by 21.9 % on BTFE (AUROC: 0.763 vs. 0.930). By selecting the highest-quality stack in each orientation, OR-KAN improved brain volume reconstruction success rates (with NiftyMIC) by 29.2 % (from 64.4 % to 83.2 %) and by 42.5 % (from 60.0 % to 85.5 %) for normal and abnormal fetuses, respectively. We anticipate its immediate utility in improving the diagnosis of fetal brain abnormalities and advancing the study of human brain development in utero across a wide range of clinical and neuroscientific applications. Code is available at: <span><span>https://github.com/birthlab/OR-KAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103994"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146152678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WSISum: WSI summarization via dual-level semantic reconstruction 基于双层语义重构的WSI摘要
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-02 DOI: 10.1016/j.media.2026.103970
Baizhi Wang , Kun Zhang , Yuhao Wang , Yunjie Gu , Haijing Luan , Ying Zhou , Taiyuan Hu , Rundong Wang , Zhidong Yang , Zihang Jiang , Rui Yan , S. Kevin Zhou
Each gigapixel whole slide image (WSI) contains tens of thousands of patches, many of which are redundant, leading to significant computational, storage, and transmission overhead. This motivates the need for automatic WSI summarization, which aims to extract a compact subset of patches that can effectively approximate the original WSI. In this paper, we propose WSISum, a unified framework that performs WSI Summarization through dual-level semantic reconstruction. Specifically, WSISum integrates two complementary reconstruction strategies: low-level patch semantic reconstruction via clustering-based sparse sampling; and high-level slide semantic reconstruction through knowledge distillation from multiple WSI-level foundation models. Experimental results show that WSISum achieves satisfactory performance in a variety of downstream tasks, including cancer subtyping, biomarker prediction, and metastasis subtyping, while significantly reducing computational cost. Code and models are available at https://github.com/Badgewho/WSISum.
每个十亿像素的整张幻灯片图像(WSI)包含成千上万个补丁,其中许多是冗余的,导致大量的计算、存储和传输开销。这激发了对自动WSI总结的需求,其目的是提取一个紧凑的补丁子集,可以有效地近似原始WSI。在本文中,我们提出了一个统一的框架WSISum,该框架通过双层语义重构来完成WSI摘要。具体而言,WSISum集成了两种互补的重建策略:通过基于聚类的稀疏采样进行低级补丁语义重建;通过从多个wsi级基础模型中提取知识,进行高层次滑动语义重构。实验结果表明,WSISum在癌症亚型分型、生物标志物预测和转移亚型分型等多种下游任务中取得了令人满意的性能,同时显著降低了计算成本。代码和模型可在https://github.com/Badgewho/WSISum上获得。
{"title":"WSISum: WSI summarization via dual-level semantic reconstruction","authors":"Baizhi Wang ,&nbsp;Kun Zhang ,&nbsp;Yuhao Wang ,&nbsp;Yunjie Gu ,&nbsp;Haijing Luan ,&nbsp;Ying Zhou ,&nbsp;Taiyuan Hu ,&nbsp;Rundong Wang ,&nbsp;Zhidong Yang ,&nbsp;Zihang Jiang ,&nbsp;Rui Yan ,&nbsp;S. Kevin Zhou","doi":"10.1016/j.media.2026.103970","DOIUrl":"10.1016/j.media.2026.103970","url":null,"abstract":"<div><div>Each gigapixel whole slide image (WSI) contains tens of thousands of patches, many of which are redundant, leading to significant computational, storage, and transmission overhead. This motivates the need for automatic WSI summarization, which aims to extract a compact subset of patches that can effectively approximate the original WSI. In this paper, we propose <strong>WSISum</strong>, a unified framework that performs <strong>WSI Sum</strong>marization through dual-level semantic reconstruction. Specifically, WSISum integrates two complementary reconstruction strategies: <em>low-level patch semantic reconstruction</em> via clustering-based sparse sampling; and <em>high-level slide semantic reconstruction</em> through knowledge distillation from multiple WSI-level foundation models. Experimental results show that WSISum achieves satisfactory performance in a variety of downstream tasks, including cancer subtyping, biomarker prediction, and metastasis subtyping, while significantly reducing computational cost. Code and models are available at <span><span>https://github.com/Badgewho/WSISum</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103970"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146109933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1