首页 > 最新文献

Computerized Medical Imaging and Graphics最新文献

英文 中文
The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes 4D人类胚胎脑图谱:快速解剖变化的时空图谱生成
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-13 DOI: 10.1016/j.compmedimag.2026.102702
Wietske A.P. Bastiaansen , Melek Rousian , Anton H.J. Koning , Wiro J. Niessen , Bernadette S. de Bakker , Régine P.M. Steegers-Theunissen , Stefan Klein
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
早期大脑发育对终身神经发育健康至关重要。然而,尽管大脑在几天的时间跨度内经历了快速的变化,但目前的临床实践提供的正常胚胎脑解剖超声知识有限。为了提供对正常大脑发育的详细了解并识别偏差,我们使用基于深度学习的方法创建了4D人类胚胎大脑图谱,用于分组注册和时空图谱生成。我们的方法引入了一个与时间相关的初始图谱,并对其偏差进行惩罚,确保在快速开发过程中保持年龄特异性解剖。该图谱是通过在妊娠8周至12周期间获得的来自402名鹿特丹围孕期队列受试者的831张3D超声图像生成并验证的。我们通过消融研究评估了该方法的有效性,该研究表明,结合时间依赖的初始图谱和惩罚产生了解剖学上准确的结果。相比之下,忽略这些适应导致了一个解剖学上不正确的地图集。与现有离体胚胎图谱的视觉比较进一步证实了我们图谱的解剖学准确性。总之,所提出的方法成功地捕获了胚胎大脑的快速解剖发育。由此产生的4D人类胚胎脑图谱为这一关键的早期生命阶段提供了独特的见解,并具有改善产前神经发育障碍的检测、预防和治疗的潜力。
{"title":"The 4D Human Embryonic Brain Atlas: Spatiotemporal atlas generation for rapid anatomical changes","authors":"Wietske A.P. Bastiaansen ,&nbsp;Melek Rousian ,&nbsp;Anton H.J. Koning ,&nbsp;Wiro J. Niessen ,&nbsp;Bernadette S. de Bakker ,&nbsp;Régine P.M. Steegers-Theunissen ,&nbsp;Stefan Klein","doi":"10.1016/j.compmedimag.2026.102702","DOIUrl":"10.1016/j.compmedimag.2026.102702","url":null,"abstract":"<div><div>Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to an anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anatomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102702"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TGIAlign: Text-guided dual-branch bidirectional framework for cross-modal semantic alignment in medical vision-language TGIAlign:用于医学视觉语言中跨模态语义对齐的文本引导双分支双向框架
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-13 DOI: 10.1016/j.compmedimag.2025.102694
Wenhua Li, Lifang Wang, Min Zhao, Xingzhang Lü, Linwen Yi
Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.
由于细微的病变模式、异构的视觉语言语义以及在视觉编码过程中缺乏病变感知引导,医学图像-文本对齐仍然具有挑战性。现有的方法通常只在计算了视觉特征之后才引入文本信息,使得早期和中期的表示没有充分地依赖于诊断语义。这限制了该模型捕获细粒度异常并在异构胸部x射线数据集上保持稳定对齐的能力。为了解决这些限制,我们提出了TGIAlign,这是一个文本引导的双分支双向对齐框架,它将结构化的、以病变为中心的线索应用于从冻结编码器获得的中间视觉表示。使用大型语言模型(LLM)提取归一化的、基于属性的病变描述,在样本之间提供一致的语义指导。这些线索通过文本引导图像特征加权(TGIF)模块合并,该模块使用相似度衍生的权重重新加权中间特征输出,从而在不修改冻结主干的情况下实现多尺度语义调节。为了捕获互补的视觉线索,TGIAlign通过双分支双向对齐(Dual-Branch Bidirectional Alignment, DBBA)机制将多尺度文本引导特征与高级视觉表示相结合。在6个公共胸片数据集上的实验表明,TGIAlign实现了稳定的top-K检索和可靠的文本引导病变定位,突出了早期语义条件反射结合双分支对齐在改善胸片环境下医学视觉语言对应性方面的有效性。
{"title":"TGIAlign: Text-guided dual-branch bidirectional framework for cross-modal semantic alignment in medical vision-language","authors":"Wenhua Li,&nbsp;Lifang Wang,&nbsp;Min Zhao,&nbsp;Xingzhang Lü,&nbsp;Linwen Yi","doi":"10.1016/j.compmedimag.2025.102694","DOIUrl":"10.1016/j.compmedimag.2025.102694","url":null,"abstract":"<div><div>Medical image–text alignment remains challenging due to subtle lesion patterns, heterogeneous vision–language semantics, and the lack of lesion-aware guidance during visual encoding. Existing methods typically introduce textual information only after visual features have been computed, leaving early and mid-level representations insufficiently conditioned on diagnostic semantics. This limits the model’s ability to capture fine-grained abnormalities and maintain stable alignment across heterogeneous chest X-ray datasets. To address these limitations, we propose TGIAlign, a text-guided dual-branch bidirectional alignment framework that applies structured, lesion-centric cues to intermediate visual representations obtained from the frozen encoder. A large language model (LLM) is used to extract normalized, attribute-based lesion descriptions, providing consistent semantic guidance across samples. These cues are incorporated through the Text-Guided Image Feature Weighting (TGIF) module, which reweights intermediate feature outputs using similarity-derived weights, enabling multi-scale semantic conditioning without modifying the frozen backbone. To capture complementary visual cues, TGIAlign integrates multi-scale text-guided features with high-level visual representations through a Dual-Branch Bidirectional Alignment (DBBA) mechanism. Experiments on six public chest X-ray datasets demonstrate that TGIAlign achieves stable top-K retrieval and reliable text-guided lesion localization, highlighting the effectiveness of early semantic conditioning combined with dual-branch alignment for improving medical vision–language correspondence within chest X-ray settings.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102694"},"PeriodicalIF":4.9,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TA-MedSAM: Text-augmented improved MedSAM for pulmonary lesion segmentation TA-MedSAM:文本增强改进MedSAM用于肺病变分割
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-12 DOI: 10.1016/j.compmedimag.2026.102698
Siyuan Tang , Siriguleng Wang , Gang Xiang , Jinliang Zhao , Yuxin Wang
Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.
肺病变的准确分割对临床诊断至关重要。传统的医学图像分割方法仅依赖于单峰视觉数据,这限制了现有医学图像分割模型的性能。本文介绍了一种新的方法,文本增强医学片段任意模块(TA-MedSAM),它通过视觉语言融合范式增强了跨模态表示能力。该方法显著提高了对对比度低、边界模糊、形态复杂、体积小等具有挑战性特征的肺病变的分割精度。首先,我们引入了一个轻量级的医学分割任意模型(MedSAM)图像编码器和一个预训练的ClinicalBERT文本编码器来提取视觉和文本特征,该设计在保留分割性能的同时减少了模型参数和计算成本,从而提高了推理速度。其次,提出重构文本模块,使模型聚焦于以病灶为中心的文本线索,加强对分割的语义引导;第三,我们开发了一种有效的多模态特征融合模块,利用注意机制将视觉和文本特征集成在一起,引入特征对齐协调机制来相互增强模态间的异构信息,并提出了一种动态感知学习机制来定量评估融合效果,实现最优融合特征选择以提高分割精度。最后,结合多任务损失函数的多尺度特征融合模块提高了复杂区域的分割性能。对比实验表明,TA-MedSAM在QaTa-COV19、MosMedData+ 和私有数据集上的性能优于最先进的单峰和多峰方法。广泛的消融研究证实了我们提出的组件和最佳超参数组合的有效性。
{"title":"TA-MedSAM: Text-augmented improved MedSAM for pulmonary lesion segmentation","authors":"Siyuan Tang ,&nbsp;Siriguleng Wang ,&nbsp;Gang Xiang ,&nbsp;Jinliang Zhao ,&nbsp;Yuxin Wang","doi":"10.1016/j.compmedimag.2026.102698","DOIUrl":"10.1016/j.compmedimag.2026.102698","url":null,"abstract":"<div><div>Accurate segmentation of lung lesions is critical for clinical diagnosis. Traditional methods rely solely on unimodal visual data, which limits the performance of existing medical image segmentation models. This paper introduces a novel approach, Text-Augmented Medical Segment Anything Module(TA-MedSAM), which enhances cross-modal representation capabilities through a vision-language fusion paradigm. This method significantly improves segmentation accuracy for pulmonary lesions with challenging characteristics including low contrast, blurred boundaries, complex morphology, and small size. Firstly, we introduce a lightweight Medical Segment Anything Model (MedSAM) image encoder and a pre-trained ClinicalBERT text encoder to extract visual and textual features, This design preserves segmentation performance while reducing model parameters and computational costs, thereby enhancing inference speed. Secondly, a Reconstruction Text Module is proposed to focus the model on lesion-centric textual cues, strengthening semantic guidance for segmentation. Thirdly, we develop an effective Multimodal Feature Fusion Module that integrates visual and textual features using attention mechanisms, and introduce a feature alignment coordination mechanism to mutually enhance heterogeneous information across modalities, and a Dynamic Perception Learning Mechanism is proposed to quantitatively evaluate fusion effectiveness, enabling optimal fused feature selection for improved segmentation accuracy. Finally, a Multi-scale Feature Fusion Module combined with a Multi-task Loss Function enhances segmentation performance for complex regions. Comparative experiments demonstrate that TA-MedSAM outperforms state-of-the-art unimodal and multimodal methods on QaTa-COV19, MosMedData+ , and private dataset. Extensive ablation studies validate the efficacy of our proposed components and optimal hyperparameter combinations.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102698"},"PeriodicalIF":4.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ThyFusionNet: A CNN–transformer framework with spatial aware sparse attention for multi modal thyroid disease diagnosis ThyFusionNet:一个具有空间感知稀疏关注的CNN-transformer框架,用于多模态甲状腺疾病诊断
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-11 DOI: 10.1016/j.compmedimag.2026.102706
Bing Yang , Jun Li , Junyang Chen , Yutong Huang , Nanbo Xu , Qiurui Liu , Jiaxin Liu , Yuheng Zhou
In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.
在医学图像分析中,准确诊断复杂病变仍然是一个艰巨的挑战,特别是甲状腺疾病,其发病率高,病理复杂。为了提高诊断的准确性和鲁棒性,我们组装了ThyM3,一个由甲状腺计算机断层扫描和超声图像组成的大规模多模态数据集。在此基础上,我们介绍了ThyFusionNet,这是一种新颖的深度学习架构,将卷积主干与变压器模块相结合,并执行特征级融合,以利用跨模式的互补线索。为了改善语义对齐和空间建模,我们结合了头部位置编码和自适应稀疏注意方案,该方案在突出关键特征的同时抑制冗余激活。跳过连接用于保留低级细节,而门控注意融合块进一步丰富了跨模态交互。我们还提出了一种自适应的对比熵损失算法,在保持特征一致性的同时增强了预测的可判别性和稳定性。大量实验表明,ThyFusionNet在准确性、稳健性和泛化方面超过了目前领先的方法,强调了其在临床部署方面的强大潜力。
{"title":"ThyFusionNet: A CNN–transformer framework with spatial aware sparse attention for multi modal thyroid disease diagnosis","authors":"Bing Yang ,&nbsp;Jun Li ,&nbsp;Junyang Chen ,&nbsp;Yutong Huang ,&nbsp;Nanbo Xu ,&nbsp;Qiurui Liu ,&nbsp;Jiaxin Liu ,&nbsp;Yuheng Zhou","doi":"10.1016/j.compmedimag.2026.102706","DOIUrl":"10.1016/j.compmedimag.2026.102706","url":null,"abstract":"<div><div>In medical image analysis, accurately diagnosing complex lesions remains a formidable challenge, especially for thyroid disorders, which exhibit high incidence and intricate pathology. To enhance diagnostic precision and robustness, we assembled ThyM3, a large-scale multimodal dataset comprising thyroid computed tomography and ultrasound images. Building on this resource, we introduce ThyFusionNet, a novel deep-learning architecture that combines convolutional backbones with transformer modules and performs feature-level fusion to exploit complementary cues across modalities. To improve semantic alignment and spatial modeling, we incorporate head-wise positional encodings and an adaptive sparse attention scheme that suppresses redundant activations while highlighting key features. Skip connections are used to retain low-level details, and a gated-attention fusion block further enriches cross-modal interaction. We also propose an adaptive contrastive-entropy loss that preserves feature consistency and simultaneously enhances prediction discriminability and stability. Extensive experiments demonstrate that ThyFusionNet surpasses current leading methods in accuracy, robustness, and generalization, underscoring its strong potential for clinical deployment.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102706"},"PeriodicalIF":4.9,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction 一个多专家深度学习框架与法学硕士指导仲裁多模态组织病理学预测
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-08 DOI: 10.1016/j.compmedimag.2026.102704
Shyam Sundar Debsarkar , V.B. Surya Prasath
Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.
深度学习的最新进展显著提高了计算病理学的准确性;然而,传统的模型集成策略往往缺乏适应性和可解释性,阻碍了临床适应性。虽然多个人工智能(AI)专家模型可以提供互补的视角,但简单地汇总它们的输出通常不足以处理模型间的分歧并提供可解释的决策。为了应对这些挑战,我们提出了一种新的多专家框架,该框架集成了各种基于视觉的预测器和基于临床特征的模型,并使用大型语言模型(LLM)作为智能仲裁员。通过利用llm的上下文推理和解释能力,我们的架构动态地综合了来自成像和临床数据的见解,解决了模型冲突,并提供了透明、理性的决策。我们在两个癌症组织病理学数据集上验证了我们的方法,即HMU-GC-HE-30K,这是一个仅包含病理图像的胃癌数据集,以及BCNB,这是一个多模式的乳腺癌活检数据集,包含病理成像和临床信息。我们提出的多专家,LLM仲裁框架(MELLMA)优于卷积神经网络(cnn)和变压器,它们是目前最先进的分类集成模型,总体效果更好。我们测试了不同的llm作为仲裁者,即LLaMA、GPT变体和Mistral。此外,我们提出的框架在数据集上优于强大的单代理CNN/ViT基线,并且研究表明,学习到的每个代理信任在不改变提示或数据的情况下大大提高了仲裁员的决策。这些实验结果表明,法学硕士指导的仲裁始终比单个模型、多数投票的传统集成、统一平均和元学习器提供更稳健和可解释的性能。获得的结果突出了法学硕士驱动的仲裁在数字病理学中构建透明和可扩展的人工智能系统的承诺。
{"title":"A multi-expert deep learning framework with LLM-guided arbitration for multimodal histopathology prediction","authors":"Shyam Sundar Debsarkar ,&nbsp;V.B. Surya Prasath","doi":"10.1016/j.compmedimag.2026.102704","DOIUrl":"10.1016/j.compmedimag.2026.102704","url":null,"abstract":"<div><div>Recent advances in deep learning have significantly improved the accuracy of computational pathology; however conventional model ensembling strategies often lack adaptability and interpretability hindering the clinical adaptability. While multiple artificial intelligence (AI) expert models can provide complementary perspectives, simply aggregating their outputs is often insufficient for handling inter-model disagreement and delivering interpretable decisions. To address these challenges, we propose a novel multi-expert framework that integrates diverse vision-based predictors and a clinical feature-based model, with a large language model (LLM) acting as an intelligent arbitrator. By leveraging the contextual reasoning and explanation capabilities of LLMs, our architecture dynamically synthesizes insights from both imaging and clinical data, resolving model conflicts, and providing transparent, rational decisions. We validate our approach on two cancer histopathology datasets, namely the HMU-GC-HE-30K which is a gastric cancer dataset containing pathology images only, and the BCNB which is a breast cancer biopsy dataset that is multimodal — contains pathology imaging and clinical information. Our proposed multi-expert, LLM arbitrated framework (MELLMA) outperforms convolutional neural networks (CNNs), and transformers, which are currently the de facto and state-of-the-art classification ensemble models, with better overall results. We test different LLMs as arbitrators, namely LLaMA, GPT variants, and Mistral. Further, our proposed framework outperforms strong single-agent CNN/ViT baselines on the datasets, and ablations show that learned per-agent trust materially improves the arbitrator’s decisions without altering prompts or data. These experimental results demonstrate that LLM-guided arbitration consistently provides more robust and explainable performance than individual models, conventional ensembling with majority vote, uniform average, and meta-learners. The results obtained highlight the promise of LLM-driven arbitration for building transparent and extensible AI systems in digital pathology.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102704"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid Transformer-CNN framework for uncertainty-guided semi-supervised multiclass eye disease classification with enhanced interpretability 一种用于不确定性引导的半监督多类眼病分类的混合Transformer-CNN框架
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-08 DOI: 10.1016/j.compmedimag.2026.102701
Muhammad Hammad Malik , Zishuo Wan , Yingying Ren , Da-Wei Ding
The accurate classification of eye diseases such as cataract, diabetic retinopathy (DR), glaucoma, and healthy conditions from fundus images remains a critical challenge in ophthalmology, requiring early diagnosis and treatment to prevent vision loss. Existing deep learning methods rely on large labeled datasets, inefficient use of unlabeled data, and limited interpretability, restricting clinical applicability. To address these limitations, we propose a novel CNN-Transformer hybrid architecture coupled with innovative semi-supervised learning (SSL) and explainability techniques to enhance multiclass eye disease classification. Our methodology integrates a ConvNeXt backbone with Transformer modules, leveraging multi-head attention to effectively capture both spatial features and long-range dependencies. We introduce Uncertainty-Guided MixMatch (UG-MixMatch), a semi-supervised framework that leverages Monte Carlo (MC) dropout for uncertainty quantification and pseudo-label refinement, effectively utilizing both labeled and unlabeled data. For interpretability, we propose a novel Gradient-based Integrated Attention Map (GIAM), which aggregates attention maps across multiple layers. It incorporates adaptive channel-wise weighting, offering more detailed insights into model predictions, surpassing traditional Grad-CAM methods. Evaluated on the Ocular Imaging Health (OIH) dataset of 4215 fundus images across four classes, our approach achieved a 95.27 % classification accuracy using UG-MixMatch and 95.51 % when incorporating MC dropout for direct model evaluation. Cohen’s kappa score reached 93.70, indicating near-perfect agreement with the ground truth. Class-wise performance was exceptional, with 100 % sensitivity and specificity for DR and over 95 % specificity for cataract and glaucoma. Robust AUC values were observed, including 1.00 for DR and cataract and 0.99 for glaucoma and healthy cases. GIAM visualizations effectively highlighted disease-relevant regions, offering enhanced clinical interpretability and validation potential. Our framework addresses data scarcity, enhances interpretability, and delivers clinically relevant performance, a promising step towards scalable, explainable, and accurate diagnostic tools for Clinical Decision Support Systems (CDSS) and ophthalmic screening.
从眼底图像中准确分类白内障、糖尿病视网膜病变(DR)、青光眼和健康状况等眼病仍然是眼科的一个关键挑战,需要早期诊断和治疗以防止视力丧失。现有的深度学习方法依赖于大型标记数据集,未标记数据的使用效率低下,可解释性有限,限制了临床适用性。为了解决这些限制,我们提出了一种新颖的CNN-Transformer混合架构,结合创新的半监督学习(SSL)和可解释性技术来增强多类眼病分类。我们的方法将ConvNeXt主干与Transformer模块集成在一起,利用多头注意力有效地捕获空间特征和远程依赖关系。我们引入了不确定性引导混合匹配(UG-MixMatch),这是一种半监督框架,利用蒙特卡罗(MC) dropout进行不确定性量化和伪标签细化,有效地利用了标记和未标记的数据。为了提高可解释性,我们提出了一种新的基于梯度的集成注意图(GIAM),它将多层的注意图聚集在一起。它结合了自适应通道加权,为模型预测提供了更详细的见解,超越了传统的Grad-CAM方法。在眼成像健康(OIH)数据集上对4个类别的4215张眼底图像进行了评估,我们的方法使用UG-MixMatch实现了95.27 %的分类准确率,而当结合MC dropout进行直接模型评估时,我们的方法实现了95.51 %的分类准确率。科恩的kappa分数达到了93.70,表明与基本事实几乎完美吻合。分类表现优异,DR的敏感性和特异性为100% %,白内障和青光眼的特异性超过95% %。观察到稳健的AUC值,其中DR和白内障为1.00,青光眼和健康病例为0.99。GIAM可视化有效地突出了疾病相关区域,提供了增强的临床可解释性和验证潜力。我们的框架解决了数据稀缺问题,增强了可解释性,并提供了临床相关的性能,这是朝着临床决策支持系统(CDSS)和眼科筛查的可扩展、可解释和准确诊断工具迈出的有希望的一步。
{"title":"A hybrid Transformer-CNN framework for uncertainty-guided semi-supervised multiclass eye disease classification with enhanced interpretability","authors":"Muhammad Hammad Malik ,&nbsp;Zishuo Wan ,&nbsp;Yingying Ren ,&nbsp;Da-Wei Ding","doi":"10.1016/j.compmedimag.2026.102701","DOIUrl":"10.1016/j.compmedimag.2026.102701","url":null,"abstract":"<div><div>The accurate classification of eye diseases such as cataract, diabetic retinopathy (DR), glaucoma, and healthy conditions from fundus images remains a critical challenge in ophthalmology, requiring early diagnosis and treatment to prevent vision loss. Existing deep learning methods rely on large labeled datasets, inefficient use of unlabeled data, and limited interpretability, restricting clinical applicability. To address these limitations, we propose a novel CNN-Transformer hybrid architecture coupled with innovative semi-supervised learning (SSL) and explainability techniques to enhance multiclass eye disease classification. Our methodology integrates a ConvNeXt backbone with Transformer modules, leveraging multi-head attention to effectively capture both spatial features and long-range dependencies. We introduce Uncertainty-Guided MixMatch (UG-MixMatch), a semi-supervised framework that leverages Monte Carlo (MC) dropout for uncertainty quantification and pseudo-label refinement, effectively utilizing both labeled and unlabeled data. For interpretability, we propose a novel Gradient-based Integrated Attention Map (GIAM), which aggregates attention maps across multiple layers. It incorporates adaptive channel-wise weighting, offering more detailed insights into model predictions, surpassing traditional Grad-CAM methods. Evaluated on the Ocular Imaging Health (OIH) dataset of 4215 fundus images across four classes, our approach achieved a 95.27 % classification accuracy using UG-MixMatch and 95.51 % when incorporating MC dropout for direct model evaluation. Cohen’s kappa score reached 93.70, indicating near-perfect agreement with the ground truth. Class-wise performance was exceptional, with 100 % sensitivity and specificity for DR and over 95 % specificity for cataract and glaucoma. Robust AUC values were observed, including 1.00 for DR and cataract and 0.99 for glaucoma and healthy cases. GIAM visualizations effectively highlighted disease-relevant regions, offering enhanced clinical interpretability and validation potential. Our framework addresses data scarcity, enhances interpretability, and delivers clinically relevant performance, a promising step towards scalable, explainable, and accurate diagnostic tools for Clinical Decision Support Systems (CDSS) and ophthalmic screening.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102701"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPARSE data, rich results: Few-shot semi-supervised learning via class-conditioned image translation 数据稀疏,结果丰富:基于类条件图像翻译的少镜头半监督学习
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-08 DOI: 10.1016/j.compmedimag.2026.102705
Guido Manni , Clemente Lauretti , Loredana Zollo , Paolo Soda
Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks: a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier, within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at https://github.com/GuidoManni/SPARSE.
深度学习已经彻底改变了医学成像,但其有效性受到标记训练数据不足的严重限制。本文介绍了一种新颖的基于gan的半监督学习框架,专为低标记数据机制设计,每个类在5到50个标记样本的设置下进行评估。我们的方法集成了三个专门的神经网络:一个用于类别条件图像翻译的生成器,一个用于真实性评估和分类的判别器,以及一个专用分类器,在一个三相训练框架内。该方法在有限标记数据的监督训练和无监督学习之间交替进行,无监督学习通过图像到图像的翻译而不是从噪声中生成大量未标记的图像。我们采用基于集合的伪标记,通过指数移动平均将鉴别器和分类器的置信度加权预测与时间一致性相结合,从而对未标记的数据进行可靠的标记估计。对11个MedMNIST数据集的综合评估表明,我们的方法比六种最先进的基于gan的半监督方法在统计上取得了显着的改进,在极端的5次设置中表现尤为突出,其中标记数据的稀缺性是最具挑战性的。该框架在所有评估设置(每班5次、10次、20次和50次)中保持其优势。我们的方法为标注成本过高的医学成像应用提供了一种实用的解决方案,即使使用最小的标记数据也能实现稳健的分类性能。代码可从https://github.com/GuidoManni/SPARSE获得。
{"title":"SPARSE data, rich results: Few-shot semi-supervised learning via class-conditioned image translation","authors":"Guido Manni ,&nbsp;Clemente Lauretti ,&nbsp;Loredana Zollo ,&nbsp;Paolo Soda","doi":"10.1016/j.compmedimag.2026.102705","DOIUrl":"10.1016/j.compmedimag.2026.102705","url":null,"abstract":"<div><div>Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces a novel GAN-based semi-supervised learning framework specifically designed for low labeled-data regimes, evaluated across settings with 5 to 50 labeled samples per class. Our approach integrates three specialized neural networks: a generator for class-conditioned image translation, a discriminator for authenticity assessment and classification, and a dedicated classifier, within a three-phase training framework. The method alternates between supervised training on limited labeled data and unsupervised learning that leverages abundant unlabeled images through image-to-image translation rather than generation from noise. We employ ensemble-based pseudo-labeling that combines confidence-weighted predictions from the discriminator and classifier with temporal consistency through exponential moving averaging, enabling reliable label estimation for unlabeled data. Comprehensive evaluation across eleven MedMNIST datasets demonstrates that our approach achieves statistically significant improvements over six state-of-the-art GAN-based semi-supervised methods, with particularly strong performance in the extreme 5-shot setting where the scarcity of labeled data is most challenging. The framework maintains its superiority across all evaluated settings (5, 10, 20, and 50 shots per class). Our approach offers a practical solution for medical imaging applications where annotation costs are prohibitive, enabling robust classification performance even with minimal labeled data. Code is available at <span><span>https://github.com/GuidoManni/SPARSE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102705"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SGAFNet: Robust brain tumor segmentation via learnable sequence-guided adaptive fusion in available MRI acquisitions SGAFNet:通过可学习的序列引导自适应融合在可用的MRI采集中进行稳健的脑肿瘤分割
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-08 DOI: 10.1016/j.compmedimag.2026.102703
Zhuoneng Zhang , Luyi Han , Dengqiang Jia , Tianyu Zhang , Zehui Lin , Kahou Chan , Jiaju Huang , Shaobin Chen , Xiangyu Xiong , Sio-Kei Im , Tao Tan , Yue Sun
Automatic and accurate segmentation of brain tumors from Magnetic Resonance Imaging (MRI) data holds significant promise for advancing clinical applicability. However, substantial challenges persist in algorithm development, particularly in scenarios where MRI sequences are incomplete or missing. Although recent automatic segmentation methods have demonstrated notable progress in addressing incomplete sequence scenarios, they often overlook the varying contributions of different MRI sequences to the final segmentation. To address this limitation, we propose a Learnable Sequence-Guided Adaptive Fusion Network (SGAFNet) for robust brain tumor segmentation under incomplete sequence scenarios. Our architecture features parallel encoder–decoders for sequence-specific feature extraction, enhanced by two novel components: (1) a Learned Sequence-Guided Weighted Average (SGWA) module, which adaptively fuses different sequence features by learning sequence-specific contribution factors based on embedded priors, and (2) a Sequence-Specific Attention (SSA) module, which establishes cross-sequence dependencies between available sequences feature and the fused features generated by the SGWA. Comprehensive experiments on the BraTS2018 and BraTS2020 datasets demonstrate that our framework achieves state-of-the-art performance in handling incomplete sequences scenarios compared to existing approaches, with ablation studies confirming the critical role of the proposed SGWA and SSA modules. Increased robustness to incomplete MRI acquisitions enhances clinical applicability, facilitating more consistent diagnostic workflows.
从磁共振成像(MRI)数据中自动准确地分割脑肿瘤,对提高临床应用前景具有重要意义。然而,在算法开发中仍然存在重大挑战,特别是在MRI序列不完整或缺失的情况下。尽管最近的自动分割方法在处理不完整序列场景方面取得了显著进展,但它们往往忽略了不同MRI序列对最终分割的不同贡献。为了解决这一限制,我们提出了一种可学习序列引导的自适应融合网络(SGAFNet),用于不完整序列场景下的鲁棒脑肿瘤分割。我们的架构采用并行编码器-解码器进行序列特定特征提取,并通过两个新组件增强:(1)学习序列引导加权平均(SGWA)模块,该模块通过基于嵌入先验的序列特定贡献因子学习自适应融合不同的序列特征;(2)序列特定注意(SSA)模块,该模块在可用序列特征和SGWA生成的融合特征之间建立跨序列依赖关系。在BraTS2018和BraTS2020数据集上的综合实验表明,与现有方法相比,我们的框架在处理不完整序列场景方面达到了最先进的性能,烧蚀研究证实了所提出的SGWA和SSA模块的关键作用。增强了对不完整MRI采集的鲁棒性,增强了临床适用性,促进了更一致的诊断工作流程。
{"title":"SGAFNet: Robust brain tumor segmentation via learnable sequence-guided adaptive fusion in available MRI acquisitions","authors":"Zhuoneng Zhang ,&nbsp;Luyi Han ,&nbsp;Dengqiang Jia ,&nbsp;Tianyu Zhang ,&nbsp;Zehui Lin ,&nbsp;Kahou Chan ,&nbsp;Jiaju Huang ,&nbsp;Shaobin Chen ,&nbsp;Xiangyu Xiong ,&nbsp;Sio-Kei Im ,&nbsp;Tao Tan ,&nbsp;Yue Sun","doi":"10.1016/j.compmedimag.2026.102703","DOIUrl":"10.1016/j.compmedimag.2026.102703","url":null,"abstract":"<div><div>Automatic and accurate segmentation of brain tumors from Magnetic Resonance Imaging (MRI) data holds significant promise for advancing clinical applicability. However, substantial challenges persist in algorithm development, particularly in scenarios where MRI sequences are incomplete or missing. Although recent automatic segmentation methods have demonstrated notable progress in addressing incomplete sequence scenarios, they often overlook the varying contributions of different MRI sequences to the final segmentation. To address this limitation, we propose a Learnable Sequence-Guided Adaptive Fusion Network (SGAFNet) for robust brain tumor segmentation under incomplete sequence scenarios. Our architecture features parallel encoder–decoders for sequence-specific feature extraction, enhanced by two novel components: (1) a Learned Sequence-Guided Weighted Average (SGWA) module, which adaptively fuses different sequence features by learning sequence-specific contribution factors based on embedded priors, and (2) a Sequence-Specific Attention (SSA) module, which establishes cross-sequence dependencies between available sequences feature and the fused features generated by the SGWA. Comprehensive experiments on the BraTS2018 and BraTS2020 datasets demonstrate that our framework achieves state-of-the-art performance in handling incomplete sequences scenarios compared to existing approaches, with ablation studies confirming the critical role of the proposed SGWA and SSA modules. Increased robustness to incomplete MRI acquisitions enhances clinical applicability, facilitating more consistent diagnostic workflows.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102703"},"PeriodicalIF":4.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated 3D cephalometry: A lightweight V-net for landmark localization on CBCT 自动3D头测术:用于CBCT地标定位的轻量级V-net
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-06 DOI: 10.1016/j.compmedimag.2026.102700
Benedetta Baldini , Giulia Rubiu , Marco Serafin , Marco Bologna , Giuseppe Maurizio Facchi , Giuseppe Baselli , Gianluca Martino Tartaglia
Cephalometric analysis is a widely adopted procedure for clinical decision support in orthodontics. It involves manual identification of predefined anatomical landmarks on three-dimensional cone beam CT scans, followed by the computation of linear and angular measurements. To reduce processing time and operator dependency, this study aimed to develop a light-weight deep learning (DL) model capable of automatically localizing 16 anatomically defined landmarks. To ensure model robustness and generalizability, the model was trained on a dataset of 350 manually annotated CBCT scans acquired from various imaging systems, covering a wide range of patient ages and skeletal classifications. The trained model is a V-net, optimized for practical use in clinical workflows. The model achieved a mean localization error of 1.95 ± 1.06 mm, which falls within the clinically acceptable threshold of 2 mm. Moreover, the predicted landmarks were used to calculate cephalometric measurements and compare with manually derived values. The resulting errors was −0.15 ± 0.95° for angular measurements and 0.20 ± 0.28 mm for linear ones, with Bland–Altman analysis demonstrating strong agreement and acceptable variability. These results suggest that automated measurements can reliably replace manual ones. Given the clinical relevance of cephalometric parameters - particularly the ANB angle, which is critical for skeletal classification and orthodontic treatment planning - this model represents a promising clinical decision support tool. Additionally, its low computational complexity enables fast prediction, with mean inference time lower than 32 s per scan, promoting its integration into routine clinical settings due to both technical feasibility and robustness across heterogeneous datasets.
头颅测量分析是一种广泛采用的程序,为临床决策支持正畸。它包括手动识别三维锥束CT扫描上预定义的解剖标志,然后计算线性和角度测量。为了减少处理时间和对操作者的依赖,本研究旨在开发一种轻量级深度学习(DL)模型,该模型能够自动定位16个解剖定义的地标。为了确保模型的鲁棒性和泛化性,该模型在350个人工注释的CBCT扫描数据集上进行了训练,这些数据集来自各种成像系统,涵盖了广泛的患者年龄和骨骼分类。训练的模型是一个V-net,优化了临床工作流程的实际应用。模型的平均定位误差为1.95 ± 1.06 mm,在临床可接受的阈值2 mm之内。此外,预测的标志被用来计算头测量值,并与人工得出的值进行比较。角度测量的误差为- 0.15±0.95°,线性测量的误差为0.20±0.28 mm, Bland-Altman分析显示出很强的一致性和可接受的可变性。这些结果表明,自动化测量可以可靠地取代人工测量。考虑到头颅测量参数的临床相关性,特别是ANB角度,这对骨骼分类和正畸治疗计划至关重要,该模型代表了一个有前途的临床决策支持工具。此外,其较低的计算复杂度使其能够快速预测,每次扫描的平均推理时间低于32 s,由于技术可行性和跨异构数据集的鲁棒性,促进其集成到常规临床环境中。
{"title":"Automated 3D cephalometry: A lightweight V-net for landmark localization on CBCT","authors":"Benedetta Baldini ,&nbsp;Giulia Rubiu ,&nbsp;Marco Serafin ,&nbsp;Marco Bologna ,&nbsp;Giuseppe Maurizio Facchi ,&nbsp;Giuseppe Baselli ,&nbsp;Gianluca Martino Tartaglia","doi":"10.1016/j.compmedimag.2026.102700","DOIUrl":"10.1016/j.compmedimag.2026.102700","url":null,"abstract":"<div><div>Cephalometric analysis is a widely adopted procedure for clinical decision support in orthodontics. It involves manual identification of predefined anatomical landmarks on three-dimensional cone beam CT scans, followed by the computation of linear and angular measurements. To reduce processing time and operator dependency, this study aimed to develop a light-weight deep learning (DL) model capable of automatically localizing 16 anatomically defined landmarks. To ensure model robustness and generalizability, the model was trained on a dataset of 350 manually annotated CBCT scans acquired from various imaging systems, covering a wide range of patient ages and skeletal classifications. The trained model is a V-net, optimized for practical use in clinical workflows. The model achieved a mean localization error of 1.95 ± 1.06 mm, which falls within the clinically acceptable threshold of 2 mm. Moreover, the predicted landmarks were used to calculate cephalometric measurements and compare with manually derived values. The resulting errors was −0.15 ± 0.95° for angular measurements and 0.20 ± 0.28 mm for linear ones, with Bland–Altman analysis demonstrating strong agreement and acceptable variability. These results suggest that automated measurements can reliably replace manual ones. Given the clinical relevance of cephalometric parameters - particularly the ANB angle, which is critical for skeletal classification and orthodontic treatment planning - this model represents a promising clinical decision support tool. Additionally, its low computational complexity enables fast prediction, with mean inference time lower than 32 s per scan, promoting its integration into routine clinical settings due to both technical feasibility and robustness across heterogeneous datasets.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102700"},"PeriodicalIF":4.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenoPath-MCA: Multimodal masked cross-attention between genomics and pathology for survival prediction genaath - mca:多模态掩盖了基因组学和病理学之间的交叉关注,用于生存预测
IF 4.9 2区 医学 Q1 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-05 DOI: 10.1016/j.compmedimag.2026.102699
Kaixuan Zhang , Shuqi Dong , Peifeng Shi , Dingcan Hu , Geng Gao , Jinlin Yang , Tao Gan , Nini Rao
Survival prediction using whole slide images (WSIs) and bulk genes is a key task in computational pathology, essential for automated risk assessment and personalized treatment planning. While integrating WSIs with genomic features presents challenges due to inconsistent modality granularity, semantic disparity, and the lack of personalized fusion. We propose GenoPath-MCA, a novel multimodal framework that models dense cross-modal interactions between histopathology and gene expression data. A masked co-attention mechanism aligns features across modalities, and the Multimodal Masked Cross-Attention Module (M2CAM) jointly captures high-order image–gene and gene–gene relationships for enhanced semantic fusion. To address patient-level heterogeneity, we develop a Dynamic Modality Weight Adjustment Strategy (DMWAS) that adaptively modulates fusion weights based on the discriminative relevance of each modality. Additionally, an importance-guided patch selection strategy effectively filters redundant visual inputs, reducing computational cost while preserving critical context. Experiments on public multimodal cancer survival datasets demonstrate that GenoPath-MCA significantly outperforms existing methods in terms of concordance index and robustness. Visualizations of multimodal attention maps validate the biological interpretability and clinical potential of our approach.
使用全幻灯片图像(WSIs)和散装基因进行生存预测是计算病理学的关键任务,对于自动化风险评估和个性化治疗计划至关重要。然而,由于模态粒度不一致、语义差异和缺乏个性化融合,将wsi与基因组特征集成存在挑战。我们提出genaath - mca,一个新的多模态框架,模拟组织病理学和基因表达数据之间密集的跨模态相互作用。一个屏蔽的共同注意机制将不同模式的特征对齐,多模式屏蔽交叉注意模块(M2CAM)联合捕获高阶图像-基因和基因-基因关系,以增强语义融合。为了解决患者水平的异质性,我们开发了一个动态模态权重调整策略(DMWAS),该策略基于每个模态的鉴别相关性自适应调节融合权重。此外,一个重要性导向的补丁选择策略有效地过滤冗余的视觉输入,在保留关键上下文的同时降低计算成本。在公共多模态癌症生存数据集上的实验表明,genpath - mca在一致性指数和鲁棒性方面明显优于现有方法。多模态注意图的可视化验证了我们方法的生物学可解释性和临床潜力。
{"title":"GenoPath-MCA: Multimodal masked cross-attention between genomics and pathology for survival prediction","authors":"Kaixuan Zhang ,&nbsp;Shuqi Dong ,&nbsp;Peifeng Shi ,&nbsp;Dingcan Hu ,&nbsp;Geng Gao ,&nbsp;Jinlin Yang ,&nbsp;Tao Gan ,&nbsp;Nini Rao","doi":"10.1016/j.compmedimag.2026.102699","DOIUrl":"10.1016/j.compmedimag.2026.102699","url":null,"abstract":"<div><div>Survival prediction using whole slide images (WSIs) and bulk genes is a key task in computational pathology, essential for automated risk assessment and personalized treatment planning. While integrating WSIs with genomic features presents challenges due to inconsistent modality granularity, semantic disparity, and the lack of personalized fusion. We propose <strong>GenoPath-MCA</strong>, a novel multimodal framework that models dense cross-modal interactions between histopathology and gene expression data. A masked co-attention mechanism aligns features across modalities, and the Multimodal Masked Cross-Attention Module (<strong>M2CAM</strong>) jointly captures high-order image–gene and gene–gene relationships for enhanced semantic fusion. To address patient-level heterogeneity, we develop a Dynamic Modality Weight Adjustment Strategy (<strong>DMWAS</strong>) that adaptively modulates fusion weights based on the discriminative relevance of each modality. Additionally, an importance-guided patch selection strategy effectively filters redundant visual inputs, reducing computational cost while preserving critical context. Experiments on public multimodal cancer survival datasets demonstrate that GenoPath-MCA significantly outperforms existing methods in terms of concordance index and robustness. Visualizations of multimodal attention maps validate the biological interpretability and clinical potential of our approach.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"128 ","pages":"Article 102699"},"PeriodicalIF":4.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computerized Medical Imaging and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1