首页 > 最新文献

Medical image analysis最新文献

英文 中文
Transfer learning from 2D natural images to 4D fMRI brain images via geometric mapping 通过几何映射将学习从2D自然图像转移到4D fMRI脑图像
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-17 DOI: 10.1016/j.media.2026.103949
Kai Gao , Lubin Wang , Liang Li , Xiao Chen , Bin Lu , Yu-Wei Wang , Xue-Ying Li , Zi-Han Wang , Hui-Xian Li , Yi-Fan Liao , Li-Ping Cao , Guan-Mao Chen , Jian-Shan Chen , Tao Chen , Tao-Lin Chen , Yan-Rong Chen , Yu-Qi Cheng , Zhao-Song Chu , Shi-Xian Cui , Xi-Long Cui , Dewen Hu
Functional magnetic resonance imaging (fMRI) allows real-time observation of brain activity through blood oxygen level-dependent (BOLD) signals and is extensively used in studies related to sex classification, age estimation, behavioral measurements prediction, and mental disorder diagnosis. However, the application of deep learning techniques to brain fMRI analysis is hindered by the small sample size of fMRI datasets. Transfer learning offers a solution to this problem, but most existing approaches are designed for large-scale 2D natural images. The heterogeneity between 4D fMRI data and 2D natural images makes direct model transfer infeasible. This study proposes a novel geometric mapping-based fMRI transfer learning method that enables transfer learning from 2D natural images to 4D fMRI brain images, bridging the transfer learning gap between fMRI data and natural images. The proposed Multi-scale Multi-domain Feature Aggregation (MMFA) module extracts effective aggregated features and reduces the dimensionality of fMRI data to 3D space. By treating the cerebral cortex as a folded Riemannian manifold in 3D space and mapping it into 2D space using surface geometric mapping, we make the transfer learning from 2D natural images to 4D brain images possible. Moreover, the topological relationships of the cerebral cortex are maintained with our method, and calculations are performed along the Riemannian manifold of the brain, effectively addressing signal interference problems. The experimental results based on the Human Connectome Project (HCP) dataset demonstrate the effectiveness of the proposed method. Our method achieved state-of-the-art performance in sex classification, age estimation, and behavioral measurement prediction tasks. Moreover, we propose a cascaded transfer learning approach for depression diagnosis, and proved its effectiveness on 23 depression datasets. In summary, the proposed fMRI transfer learning method, which accounts for the structural characteristics of the brain, is promising for applying transfer learning from natural images to brain fMRI images, significantly enhancing the performance in various fMRI analysis tasks.
功能磁共振成像(fMRI)可以通过血氧水平依赖(BOLD)信号实时观察大脑活动,并广泛应用于性别分类、年龄估计、行为测量预测和精神障碍诊断等研究。然而,深度学习技术在脑功能磁共振成像分析中的应用受到功能磁共振成像数据集样本量小的阻碍。迁移学习为这个问题提供了一个解决方案,但是大多数现有的方法都是为大规模的二维自然图像设计的。4D fMRI数据与2D自然图像之间的异质性使得直接模型转移不可行。本研究提出了一种新的基于几何映射的fMRI迁移学习方法,实现了从二维自然图像到四维fMRI脑图像的迁移学习,弥合了fMRI数据与自然图像之间的迁移学习差距。提出的多尺度多域特征聚合(MMFA)模块提取有效的聚合特征,并将fMRI数据降维到三维空间。通过将大脑皮层视为三维空间中的折叠黎曼流形,并使用表面几何映射将其映射到二维空间,我们使从二维自然图像到四维大脑图像的迁移学习成为可能。此外,我们的方法保持了大脑皮层的拓扑关系,并沿着大脑的黎曼流形进行计算,有效地解决了信号干扰问题。基于人类连接组计划(HCP)数据集的实验结果证明了该方法的有效性。我们的方法在性别分类、年龄估计和行为测量预测任务中取得了最先进的性能。此外,我们提出了一种用于抑郁症诊断的级联迁移学习方法,并在23个抑郁症数据集上证明了其有效性。综上所述,本文提出的fMRI迁移学习方法考虑了大脑的结构特征,有望将自然图像的迁移学习应用于大脑fMRI图像,显著提高了各种fMRI分析任务的性能。
{"title":"Transfer learning from 2D natural images to 4D fMRI brain images via geometric mapping","authors":"Kai Gao ,&nbsp;Lubin Wang ,&nbsp;Liang Li ,&nbsp;Xiao Chen ,&nbsp;Bin Lu ,&nbsp;Yu-Wei Wang ,&nbsp;Xue-Ying Li ,&nbsp;Zi-Han Wang ,&nbsp;Hui-Xian Li ,&nbsp;Yi-Fan Liao ,&nbsp;Li-Ping Cao ,&nbsp;Guan-Mao Chen ,&nbsp;Jian-Shan Chen ,&nbsp;Tao Chen ,&nbsp;Tao-Lin Chen ,&nbsp;Yan-Rong Chen ,&nbsp;Yu-Qi Cheng ,&nbsp;Zhao-Song Chu ,&nbsp;Shi-Xian Cui ,&nbsp;Xi-Long Cui ,&nbsp;Dewen Hu","doi":"10.1016/j.media.2026.103949","DOIUrl":"10.1016/j.media.2026.103949","url":null,"abstract":"<div><div>Functional magnetic resonance imaging (fMRI) allows real-time observation of brain activity through blood oxygen level-dependent (BOLD) signals and is extensively used in studies related to sex classification, age estimation, behavioral measurements prediction, and mental disorder diagnosis. However, the application of deep learning techniques to brain fMRI analysis is hindered by the small sample size of fMRI datasets. Transfer learning offers a solution to this problem, but most existing approaches are designed for large-scale 2D natural images. The heterogeneity between 4D fMRI data and 2D natural images makes direct model transfer infeasible. This study proposes a novel geometric mapping-based fMRI transfer learning method that enables transfer learning from 2D natural images to 4D fMRI brain images, bridging the transfer learning gap between fMRI data and natural images. The proposed Multi-scale Multi-domain Feature Aggregation (MMFA) module extracts effective aggregated features and reduces the dimensionality of fMRI data to 3D space. By treating the cerebral cortex as a folded Riemannian manifold in 3D space and mapping it into 2D space using surface geometric mapping, we make the transfer learning from 2D natural images to 4D brain images possible. Moreover, the topological relationships of the cerebral cortex are maintained with our method, and calculations are performed along the Riemannian manifold of the brain, effectively addressing signal interference problems. The experimental results based on the Human Connectome Project (HCP) dataset demonstrate the effectiveness of the proposed method. Our method achieved state-of-the-art performance in sex classification, age estimation, and behavioral measurement prediction tasks. Moreover, we propose a cascaded transfer learning approach for depression diagnosis, and proved its effectiveness on 23 depression datasets. In summary, the proposed fMRI transfer learning method, which accounts for the structural characteristics of the brain, is promising for applying transfer learning from natural images to brain fMRI images, significantly enhancing the performance in various fMRI analysis tasks.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103949"},"PeriodicalIF":11.8,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CATERPillar: a flexible framework for generating white matter numerical substrates with incorporated glial cells 卡特彼勒:一个灵活的框架,生成白质数值基质与合并胶质细胞
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-17 DOI: 10.1016/j.media.2026.103946
Jasmine Nguyen-Duc , Malte Brammerloh , Melina Cherchali , Inès De Riedmatten , Jean-Baptiste Pérot , Jonathan Rafael-Patiño , Ileana O. Jelescu
Monte Carlo diffusion simulations in numerical substrates are valuable for exploring the sensitivity and specificity of the diffusion MRI (dMRI) signal to realistic cell microstructure features. A crucial component of such simulations is the use of numerical phantoms that accurately represent the target tissue, which is in this case, cerebral white matter (WM). This study introduces CATERPillar (Computational Axonal Threading Engine for Realistic Proliferation), a novel method that simulates the mechanics of axonal growth using overlapping spheres as elementary units. CATERPillar facilitates parallel axon development while preventing collisions, offering user control over key structural parameters such as cellular density, undulation, beading and myelination. Its uniqueness lies in its ability to generate not only realistic axonal structures but also realistic glial cells, enhancing the biological fidelity of simulations. We showed that our grown substrates feature distributions of key morphological parameters that agree with those from histological studies. The structural realism of the astrocytic components was quantitatively validated using Sholl analysis. Furthermore, the time-dependent diffusion in the extra- and intra-axonal compartments accurately reflected expected characteristics of short-range disorder, as predicted by theoretical models. CATERPillar is open source and can be used to (a) develop new acquisition schemes that sensitise the MRI signal to unique tissue microstructure features, (b) test the accuracy of a broad range of analytical models, and (c) build a set of substrates to train machine learning models on.
在数值基底上进行蒙特卡罗扩散模拟对于探索扩散核磁共振成像(dMRI)信号对真实细胞微观结构特征的敏感性和特异性具有重要意义。这种模拟的一个关键组成部分是使用数字幻象来准确地代表目标组织,在这种情况下,就是脑白质(WM)。本文介绍了一种以重叠球体为基本单元模拟轴突生长机制的新方法——毛毛虫(Computational Axonal Threading Engine for Realistic Proliferation)。卡特彼勒促进平行轴突的发育,同时防止碰撞,为用户提供对关键结构参数的控制,如细胞密度、波动、串珠和髓鞘形成。它的独特之处在于它不仅能够生成真实的轴突结构,而且能够生成真实的胶质细胞,从而提高了模拟的生物保真度。我们发现,我们培养的底物具有与组织学研究一致的关键形态参数分布。星形细胞成分的结构真实性被定量验证使用肖尔分析。此外,轴突外室和轴突内室的时间依赖性扩散准确地反映了理论模型预测的短期紊乱的预期特征。卡特彼勒是开源的,可用于(a)开发新的采集方案,使MRI信号对独特的组织微观结构特征敏感,(b)测试广泛分析模型的准确性,以及(c)构建一组基板来训练机器学习模型。
{"title":"CATERPillar: a flexible framework for generating white matter numerical substrates with incorporated glial cells","authors":"Jasmine Nguyen-Duc ,&nbsp;Malte Brammerloh ,&nbsp;Melina Cherchali ,&nbsp;Inès De Riedmatten ,&nbsp;Jean-Baptiste Pérot ,&nbsp;Jonathan Rafael-Patiño ,&nbsp;Ileana O. Jelescu","doi":"10.1016/j.media.2026.103946","DOIUrl":"10.1016/j.media.2026.103946","url":null,"abstract":"<div><div>Monte Carlo diffusion simulations in numerical substrates are valuable for exploring the sensitivity and specificity of the diffusion MRI (dMRI) signal to realistic cell microstructure features. A crucial component of such simulations is the use of numerical phantoms that accurately represent the target tissue, which is in this case, cerebral white matter (WM). This study introduces CATERPillar (Computational Axonal Threading Engine for Realistic Proliferation), a novel method that simulates the mechanics of axonal growth using overlapping spheres as elementary units. CATERPillar facilitates parallel axon development while preventing collisions, offering user control over key structural parameters such as cellular density, undulation, beading and myelination. Its uniqueness lies in its ability to generate not only realistic axonal structures but also realistic glial cells, enhancing the biological fidelity of simulations. We showed that our grown substrates feature distributions of key morphological parameters that agree with those from histological studies. The structural realism of the astrocytic components was quantitatively validated using Sholl analysis. Furthermore, the time-dependent diffusion in the extra- and intra-axonal compartments accurately reflected expected characteristics of short-range disorder, as predicted by theoretical models. CATERPillar is open source and can be used to (a) develop new acquisition schemes that sensitise the MRI signal to unique tissue microstructure features, (b) test the accuracy of a broad range of analytical models, and (c) build a set of substrates to train machine learning models on.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103946"},"PeriodicalIF":11.8,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
U2AD: Uncertainty-based unsupervised anomaly detection framework for detecting T2 hyperintensity in MRI spinal cord U2AD:基于不确定性的非监督异常检测框架检测MRI脊髓T2高信号
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-17 DOI: 10.1016/j.media.2026.103939
Qi Zhang , Xiuyuan Chen , Ziyi He , Kun Wang , Lianming Wu , Hongxing Shen , Jianqi Sun
T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy (DCM). However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling alternative by eliminating the need for abnormal data annotations. However, existing UAD methods face challenges of domain shifts and task conflict. We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations. Unlike traditional methods, U2AD is designed to be trained and tested within the same clinical dataset, following a “mask-and-reconstruction” paradigm built on a Vision Transformer-based architecture. We introduce an uncertainty-guided masking strategy to resolve task conflicts between normal reconstruction and anomaly detection to achieve an optimal balance. Specifically, we employ a Monte-Carlo inference technique to estimate reconstruction uncertainty mappings during training. By iteratively optimizing reconstruction training under the guidance of both epistemic and aleatoric uncertainty, U2AD improves the normal representation learning while maintaining the sensitivity to anomalies. Experimental results demonstrate that U2AD outperforms existing UAD methods in patient-level identification and segment-level localization of spinal cord T2 hyperintensities. This framework establishes a new benchmark for incorporating uncertainty guidance into UAD. Our code is available at: https://github.com/zhibaishouheilab/U2AD
脊髓MR图像中的T2高信号是退行性颈椎病(DCM)等疾病的重要生物标志物。然而,目前的临床诊断主要依赖于人工评估。深度学习方法在病变检测方面已经显示出前景,但大多数监督方法严重依赖于大型、带注释的数据集。无监督异常检测(UAD)通过消除对异常数据注释的需要,提供了一种引人注目的替代方案。然而,现有的UAD方法面临着领域转移和任务冲突的挑战。我们提出了一个基于不确定性的无监督异常检测框架,称为U2AD,以解决这些限制。与传统方法不同,U2AD被设计为在相同的临床数据集中进行训练和测试,遵循基于Vision transformer架构的“掩模和重建”范式。我们引入了一种不确定性引导掩蔽策略来解决正常重建和异常检测之间的任务冲突,以达到最佳平衡。具体来说,我们采用蒙特卡罗推理技术来估计训练过程中的重建不确定性映射。U2AD通过在认知不确定性和任意不确定性的指导下迭代优化重构训练,在保持异常敏感性的同时提高了正常表示学习。实验结果表明,U2AD在患者水平识别和脊髓T2高信号节段水平定位方面优于现有的UAD方法。该框架为将不确定性指导纳入UAD建立了新的基准。我们的代码可在:https://github.com/zhibaishouheilab/U2AD
{"title":"U2AD: Uncertainty-based unsupervised anomaly detection framework for detecting T2 hyperintensity in MRI spinal cord","authors":"Qi Zhang ,&nbsp;Xiuyuan Chen ,&nbsp;Ziyi He ,&nbsp;Kun Wang ,&nbsp;Lianming Wu ,&nbsp;Hongxing Shen ,&nbsp;Jianqi Sun","doi":"10.1016/j.media.2026.103939","DOIUrl":"10.1016/j.media.2026.103939","url":null,"abstract":"<div><div>T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy (DCM). However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling alternative by eliminating the need for abnormal data annotations. However, existing UAD methods face challenges of domain shifts and task conflict. We propose an <strong>U</strong>ncertainty-based <strong>U</strong>nsupervised <strong>A</strong>nomaly <strong>D</strong>etection framework, termed <em>U</em><sup>2</sup><em>AD</em>, to address these limitations. Unlike traditional methods, <em>U</em><sup>2</sup><em>AD</em> is designed to be trained and tested within the same clinical dataset, following a “mask-and-reconstruction” paradigm built on a Vision Transformer-based architecture. We introduce an uncertainty-guided masking strategy to resolve task conflicts between normal reconstruction and anomaly detection to achieve an optimal balance. Specifically, we employ a Monte-Carlo inference technique to estimate reconstruction uncertainty mappings during training. By iteratively optimizing reconstruction training under the guidance of both epistemic and aleatoric uncertainty, <em>U</em><sup>2</sup><em>AD</em> improves the normal representation learning while maintaining the sensitivity to anomalies. Experimental results demonstrate that <em>U</em><sup>2</sup><em>AD</em> outperforms existing UAD methods in patient-level identification and segment-level localization of spinal cord T2 hyperintensities. This framework establishes a new benchmark for incorporating uncertainty guidance into UAD. Our code is available at: <span><span>https://github.com/zhibaishouheilab/U2AD</span><svg><path></path></svg></span></div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103939"},"PeriodicalIF":11.8,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in automated fetal brain MRI segmentation and biometry: Insights from the FeTA 2024 challenge 自动化胎儿脑MRI分割和生物计量的进展:来自FeTA 2024挑战的见解
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.media.2026.103941
Vladyslav Zalevskyi , Thomas Sanchez , Misha Kaandorp , Margaux Roulet , Diego Fajardo-Rojas , Liu Li , Jana Hutter , Hongwei Bran Li , Matthew J. Barkovich , Hui Ji , Luca Wilhelmi , Aline Dändliker , Céline Steger , Mériam Koob , Yvan Gomez , Anton Jakovčić , Melita Klaić , Ana Adžić , Pavel Marković , Gracia Grabarić , Meritxell Bach Cuadra
Accurate fetal brain tissue segmentation and biometric measurement are essential for monitoring neurodevelopment and detecting abnormalities in utero. The Fetal Tissue Annotation (FeTA) Challenges have established robust multi-center benchmarks for evaluating state-of-the-art segmentation methods. This paper presents the results of the 2024 challenge edition, which introduced three key innovations.
First, we introduced a topology-aware metric based on the Euler characteristic difference (ED) to overcome the performance plateau observed with traditional metrics like Dice or Hausdorff distance (HD), as the performance of the best models in segmentation surpassed the inter-rater variability. While the best teams reached similar scores in Dice (0.81-0.82) and HD95 (2.1–2.3 mm), ED provided greater discriminative power: the winning method achieved an ED of 20.9, representing roughly a 50% improvement over the second- and third-ranked teams despite comparable Dice scores.
Second, we introduced a new 0.55T low-field MRI test set, which, when paired with high-quality super-resolution reconstruction, achieved the highest segmentation performance across all test cohorts (Dice=0.86, HD95=1.69, ED=6.26). This provides the first quantitative evidence that low-cost, low-field MRI can match or surpass high-field systems in automated fetal brain segmentation.
Third, the new biometry estimation task exposed a clear performance gap: although the best model reached a mean average percentage error (MAPE) of 7.72%, most submissions failed to outperform a simple gestational-age-based linear regression model (MAPE=9.56%), and all remained above inter-rater variability with a MAPE of 5.38%.
Finally, by analyzing the top-performing models from FeTA 2024 alongside those from previous challenge editions, we identify ensembles of 3D nnU-Net trained on both real and synthetic data with both image- and anatomy-level augmentations as the most effective approaches for fetal brain segmentation. Our quantitative analysis reveals that acquisition site, super-resolution strategy, and image quality are the primary sources of domain shift, informing recommendations to enhance the robustness and generalizability of automated fetal brain analysis methods.
准确的胎儿脑组织分割和生物测量对于监测神经发育和检测子宫内异常是必不可少的。胎儿组织注释(FeTA)挑战为评估最先进的分割方法建立了强大的多中心基准。本文介绍了2024挑战版的结果,其中介绍了三个关键创新。首先,我们引入了基于欧拉特征差(ED)的拓扑感知度量,以克服传统度量(如Dice或Hausdorff distance, HD)所观察到的性能平台,因为最佳模型在分割中的性能超过了速率间的可变性。虽然最好的团队在Dice(0.81-0.82)和HD95(2.1-2.3 mm)中获得了相似的分数,但ED提供了更大的辨别能力:获胜方法的ED达到了20.9,比排名第二和第三的团队提高了大约50%,尽管Dice分数相当。其次,我们引入了一个新的0.55T低场MRI测试集,当它与高质量的超分辨率重建配对时,在所有测试队列中获得了最高的分割性能(Dice=0.86, HD95=1.69, ED=6.26)。这提供了第一个定量证据,低成本,低场MRI可以匹配或超过高场系统在胎儿脑自动分割。第三,新的生物特征估计任务暴露出明显的性能差距:尽管最佳模型达到7.72%的平均百分比误差(MAPE),但大多数提交的模型未能优于简单的基于胎龄的线性回归模型(MAPE=9.56%),并且都保持在5.38%的等级间变异之上。最后,通过分析FeTA 2024和以前的挑战版本中表现最好的模型,我们确定了在真实和合成数据上训练的3D nnU-Net集合,并在图像和解剖水平上增强,作为胎儿大脑分割的最有效方法。我们的定量分析表明,采集地点、超分辨率策略和图像质量是域漂移的主要来源,为提高自动化胎儿脑分析方法的鲁棒性和通用性提供了建议。
{"title":"Advances in automated fetal brain MRI segmentation and biometry: Insights from the FeTA 2024 challenge","authors":"Vladyslav Zalevskyi ,&nbsp;Thomas Sanchez ,&nbsp;Misha Kaandorp ,&nbsp;Margaux Roulet ,&nbsp;Diego Fajardo-Rojas ,&nbsp;Liu Li ,&nbsp;Jana Hutter ,&nbsp;Hongwei Bran Li ,&nbsp;Matthew J. Barkovich ,&nbsp;Hui Ji ,&nbsp;Luca Wilhelmi ,&nbsp;Aline Dändliker ,&nbsp;Céline Steger ,&nbsp;Mériam Koob ,&nbsp;Yvan Gomez ,&nbsp;Anton Jakovčić ,&nbsp;Melita Klaić ,&nbsp;Ana Adžić ,&nbsp;Pavel Marković ,&nbsp;Gracia Grabarić ,&nbsp;Meritxell Bach Cuadra","doi":"10.1016/j.media.2026.103941","DOIUrl":"10.1016/j.media.2026.103941","url":null,"abstract":"<div><div>Accurate fetal brain tissue segmentation and biometric measurement are essential for monitoring neurodevelopment and detecting abnormalities in utero. The Fetal Tissue Annotation (FeTA) Challenges have established robust multi-center benchmarks for evaluating state-of-the-art segmentation methods. This paper presents the results of the 2024 challenge edition, which introduced three key innovations.</div><div>First, we introduced a topology-aware metric based on the Euler characteristic difference (ED) to overcome the performance plateau observed with traditional metrics like Dice or Hausdorff distance (HD), as the performance of the best models in segmentation surpassed the inter-rater variability. While the best teams reached similar scores in Dice (0.81-0.82) and HD95 (2.1–2.3 mm), ED provided greater discriminative power: the winning method achieved an ED of 20.9, representing roughly a 50% improvement over the second- and third-ranked teams despite comparable Dice scores.</div><div>Second, we introduced a new 0.55T low-field MRI test set, which, when paired with high-quality super-resolution reconstruction, achieved the highest segmentation performance across all test cohorts (Dice=0.86, HD95=1.69, ED=6.26). This provides the first quantitative evidence that low-cost, low-field MRI can match or surpass high-field systems in automated fetal brain segmentation.</div><div>Third, the new biometry estimation task exposed a clear performance gap: although the best model reached a mean average percentage error (MAPE) of 7.72%, most submissions failed to outperform a simple gestational-age-based linear regression model (MAPE=9.56%), and all remained above inter-rater variability with a MAPE of 5.38%.</div><div>Finally, by analyzing the top-performing models from FeTA 2024 alongside those from previous challenge editions, we identify ensembles of 3D nnU-Net trained on both real and synthetic data with both image- and anatomy-level augmentations as the most effective approaches for fetal brain segmentation. Our quantitative analysis reveals that acquisition site, super-resolution strategy, and image quality are the primary sources of domain shift, informing recommendations to enhance the robustness and generalizability of automated fetal brain analysis methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103941"},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AEM: An interpretable multi-task multi-modal framework for cardiac disease prediction AEM:一个可解释的多任务多模式心脏病预测框架
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.media.2026.103951
Jiachuan Peng , Marcel Beetz , Abhirup Banerjee , Min Chen , Vicente Grau
Cardiovascular disease (CVD) is one of the leading causes of death and illness across the world. Especially, early prediction of heart failure (HF) is complicated due to the heterogeneity of its clinical presentations and symptoms. These challenges underscore the need for a multidisciplinary approach for comprehensive evaluation of cardiac state. To this end, we specifically select electrocardiogram (ECG) and 3D cardiac anatomy for their complementary coverage of cardiac electrical activities and fine-grained structural modeling. Building upon this, we present a novel pre-training framework, named Anatomy-Electrocardiogram Model (AEM), to explore their complex interactions. AEM adopts a multi-task self-supervised scheme that combines a masked reconstruction objective with a cardiac measurement (CM) regression branch to embed cardiac functional priors and structural details. Unlike image-domain models that typically localize the whole heart within the image, our 3D anatomy is background-free and continuous in 3D space. Hence, the model can naturally concentrate on finer structures at the patch level. The further integration with ECG captures functional dynamics through electrical conduction, encapsulating holistic cardiac representations. Extensive experiments are conducted on the multi-modal datasets collected from the UK Biobank, which contain paired biventricular point cloud anatomy and 12-lead ECG data. Our proposed AEM achieves an area under the receiver operating characteristic curve of 0.8192 for incident HF prediction and a concordance index of 0.6976 for survival prediction under linear evaluation, outperforming the state-of-the-art multi-modal methods. Additionally, we study the interpretability of the disease prediction by observing that our model effectively recognizes clinically plausible patterns and exhibits a high association with clinical features.
心血管疾病(CVD)是世界上导致死亡和疾病的主要原因之一。尤其是心力衰竭(HF)的早期预测由于其临床表现和症状的异质性而变得复杂。这些挑战强调需要一个多学科的方法来全面评估心脏状态。为此,我们特别选择了心电图(ECG)和3D心脏解剖,因为它们对心脏电活动和细粒度结构建模的补充覆盖。在此基础上,我们提出了一个新的预训练框架,称为解剖-心电图模型(AEM),以探索它们之间复杂的相互作用。AEM采用一种多任务自监督方案,该方案将隐藏重建目标与心脏测量(CM)回归分支相结合,嵌入心脏功能先验和结构细节。与通常在图像中定位整个心脏的图像域模型不同,我们的3D解剖结构在3D空间中是无背景和连续的。因此,该模型可以自然地集中在补丁级别的更精细的结构上。与ECG的进一步整合通过电传导捕获功能动态,封装整体心脏表征。对从英国生物银行收集的多模态数据集进行了广泛的实验,其中包含成对的双心室点云解剖和12导联心电图数据。我们提出的AEM在线性评估下,对事件HF预测的接受者工作特征曲线下面积为0.8192,对生存预测的一致性指数为0.6976,优于目前最先进的多模态方法。此外,我们研究了疾病预测的可解释性,观察到我们的模型有效地识别临床合理的模式,并表现出与临床特征的高度关联。
{"title":"AEM: An interpretable multi-task multi-modal framework for cardiac disease prediction","authors":"Jiachuan Peng ,&nbsp;Marcel Beetz ,&nbsp;Abhirup Banerjee ,&nbsp;Min Chen ,&nbsp;Vicente Grau","doi":"10.1016/j.media.2026.103951","DOIUrl":"10.1016/j.media.2026.103951","url":null,"abstract":"<div><div>Cardiovascular disease (CVD) is one of the leading causes of death and illness across the world. Especially, early prediction of heart failure (HF) is complicated due to the heterogeneity of its clinical presentations and symptoms. These challenges underscore the need for a multidisciplinary approach for comprehensive evaluation of cardiac state. To this end, we specifically select electrocardiogram (ECG) and 3D cardiac anatomy for their complementary coverage of cardiac electrical activities and fine-grained structural modeling. Building upon this, we present a novel pre-training framework, named Anatomy-Electrocardiogram Model (AEM), to explore their complex interactions. AEM adopts a multi-task self-supervised scheme that combines a masked reconstruction objective with a cardiac measurement (CM) regression branch to embed cardiac functional priors and structural details. Unlike image-domain models that typically localize the whole heart within the image, our 3D anatomy is background-free and continuous in 3D space. Hence, the model can naturally concentrate on finer structures at the patch level. The further integration with ECG captures functional dynamics through electrical conduction, encapsulating holistic cardiac representations. Extensive experiments are conducted on the multi-modal datasets collected from the UK Biobank, which contain paired biventricular point cloud anatomy and 12-lead ECG data. Our proposed AEM achieves an area under the receiver operating characteristic curve of 0.8192 for incident HF prediction and a concordance index of 0.6976 for survival prediction under linear evaluation, outperforming the state-of-the-art multi-modal methods. Additionally, we study the interpretability of the disease prediction by observing that our model effectively recognizes clinically plausible patterns and exhibits a high association with clinical features.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103951"},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking fairness in medical imaging: Maximizing group-specific performance with application to skin disease diagnosis 重新思考医学影像的公平性:最大化群体特异性表现与皮肤病诊断的应用
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.media.2026.103950
Gelei Xu , Yuying Duan , Jun Xia , Ching-Hao Chiu , Michael Lemmon , Wei Jin , Yiyu Shi
Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.
最近在医学图像计算方面的努力主要集中在通过在一个单一的、统一的模型中平衡公平性和准确性来提高公平性。然而,这通常会产生一种权衡:对代表性不足的群体的收益可能是以先前服务良好的群体的准确性降低为代价的。在高风险的临床环境中,即使是很小的准确性下降也会导致严重的后果,这使得这种权衡非常有争议。与其接受这种妥协,我们在本文中将公平性目标重新定义为通过利用额外的计算资源来训练特定于组的模型来最大化每个患者组的诊断准确性。为了实现这一目标,我们引入了一种新的数据重加权算法SPARE,该算法旨在优化给定组的性能。SPARE使用两个关键因素来评估每个训练样本的价值:效用,反映样本对改进模型决策边界的贡献,以及组相似性,捕获其与目标组的相关性。通过赋予在两个指标上得分较高的样本更大的权重,SPARE重新平衡了训练过程——特别是利用组外数据的价值——以提高组特定的准确性,同时避免了传统的公平-准确性权衡。在两个皮肤病数据集上的实验表明,SPARE在保持可比较的公平性指标的同时,显著提高了群体特异性的表现,突出了其作为提高临床可靠性的更实用的公平性范例的前景。
{"title":"Rethinking fairness in medical imaging: Maximizing group-specific performance with application to skin disease diagnosis","authors":"Gelei Xu ,&nbsp;Yuying Duan ,&nbsp;Jun Xia ,&nbsp;Ching-Hao Chiu ,&nbsp;Michael Lemmon ,&nbsp;Wei Jin ,&nbsp;Yiyu Shi","doi":"10.1016/j.media.2026.103950","DOIUrl":"10.1016/j.media.2026.103950","url":null,"abstract":"<div><div>Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103950"},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis 基于准多模态病理生理特征学习的视网膜疾病诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.media.2025.103886
Lu Zhang , Huizhen Yu , Zuowei Wang , Fu Gui , Yatu Guo , Wei Zhang , Mengyu Jia
Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
利用多模态数据的互补信号,可以有效地识别和诊断跨广谱的视网膜疾病。然而,在眼科实践中,多模态诊断通常在数据异质性、潜在侵入性、注册复杂性等方面受到挑战。因此,我们提出了一个整合多模态数据合成与融合的统一框架,用于视网膜疾病的分类与分级。具体来说,合成的多模态数据包括眼底荧光素血管造影(FFA)、多光谱成像(MSI)和强调潜在病变以及视盘/杯区域的显著性图。并行模型被独立训练以学习捕获交叉病理生理特征的模式特定表示。然后,这些特征在模式内部和模式之间进行自适应校准,以根据下游任务执行信息修剪和灵活集成。所提出的学习系统通过图像和特征空间的可视化来彻底解释。在两个公开数据集上进行的大量实验表明,我们的方法在多标签分类(F1-score: 0.683, AUC: 0.953)和糖尿病视网膜病变分级(准确率:0.842,Kappa: 0.861)方面优于最先进的方法。这项工作不仅提高了视网膜疾病筛查的准确性和效率,而且还为各种医学成像模式的数据增强提供了可扩展的框架。
{"title":"Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis","authors":"Lu Zhang ,&nbsp;Huizhen Yu ,&nbsp;Zuowei Wang ,&nbsp;Fu Gui ,&nbsp;Yatu Guo ,&nbsp;Wei Zhang ,&nbsp;Mengyu Jia","doi":"10.1016/j.media.2025.103886","DOIUrl":"10.1016/j.media.2025.103886","url":null,"abstract":"<div><div>Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103886"},"PeriodicalIF":11.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting diabetic macular edema treatment responses using OCT: Dataset and methods of APTOS competition 使用OCT预测糖尿病黄斑水肿治疗反应:APTOS竞争的数据集和方法
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103942
Weiyi Zhang , Peranut Chotcomwongse , Yinwen Li , Pusheng Xu , Ruijie Yao , Lianhao Zhou , Yuxuan Zhou , Hui Feng , Qiping Zhou , Xinyue Wang , Shoujin Huang , Zihao Jin , Florence H T Chung , Shujun Wang , Yalin Zheng , Mingguang He , Danli Shi , Paisan Ruamviboonsuk
Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.
糖尿病性黄斑水肿(DME)对糖尿病患者的视力损害有重要影响。玻璃体内治疗的治疗反应各不相同,因此需要对患者进行分层,以预测治疗效果并实现个性化策略。据我们所知,这项研究是第一个探索治疗前分层预测二甲醚治疗反应的研究。为了推进这一研究,我们于2021年组织了第二届亚太远程眼科学会(APTOS)大数据竞赛。比赛的重点是提高使用眼科OCT图像预测抗vegf治疗反应的准确性。我们提供了一个包含来自2000名患者的数万张OCT图像的数据集,这些图像的标签跨越四个子任务。本文详细介绍了竞赛的结构、数据集、领先方法和评估指标。该竞赛吸引了科学界的强烈参与,最初有170个团队注册,41个团队进入了决赛。表现最好的团队的AUC达到80.06%,突出了人工智能在个性化DME治疗和临床决策方面的潜力。
{"title":"Predicting diabetic macular edema treatment responses using OCT: Dataset and methods of APTOS competition","authors":"Weiyi Zhang ,&nbsp;Peranut Chotcomwongse ,&nbsp;Yinwen Li ,&nbsp;Pusheng Xu ,&nbsp;Ruijie Yao ,&nbsp;Lianhao Zhou ,&nbsp;Yuxuan Zhou ,&nbsp;Hui Feng ,&nbsp;Qiping Zhou ,&nbsp;Xinyue Wang ,&nbsp;Shoujin Huang ,&nbsp;Zihao Jin ,&nbsp;Florence H T Chung ,&nbsp;Shujun Wang ,&nbsp;Yalin Zheng ,&nbsp;Mingguang He ,&nbsp;Danli Shi ,&nbsp;Paisan Ruamviboonsuk","doi":"10.1016/j.media.2026.103942","DOIUrl":"10.1016/j.media.2026.103942","url":null,"abstract":"<div><div>Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103942"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth-induced prompt learning for laparoscopic liver landmark detection 深度诱导提示学习用于腹腔镜肝脏地标检测
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103940
Ruize Cui , Weixin Si , Zhixi Li , Kai Wang , Jialun Pei , Pheng-Ann Heng , Jing Qin
Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called L3D-2K, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D2GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D2GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at https://github.com/cuiruize/D2GPLand-Plus.
腹腔镜肝脏手术的术中环境非常复杂,肝脏变形严重,这给外科医生定位关键肝脏结构带来了挑战。解剖性肝脏标志可以极大地帮助外科医生在腹腔镜下的空间感知,并促进术前到术中登记。为了推进肝脏地标检测的研究,我们开发了一个名为L3D-2K的新数据集,其中包括来自47名患者手术视频的2000个关键帧和专家地标注释。因此,我们提出了一个基线,D2GPLand+,它有效地利用深度模式来提高地标检测性能。具体而言,我们引入了一种深度感知提示嵌入(DPE)方案,该方案在SAM编码器的自监督提示引导下动态提取与类相关的全局几何线索。此外,一个跨维统一曼巴(CUMamba)块被设计成全面结合RGB和深度特征与并发的空间和通道扫描机制。此外,我们提出了一个解剖特征增强(AFA)模块,该模块通过优化特征粒度来捕获解剖线索并强调关键结构。为了进行基准测试,我们在L3D、L3D- 2k和P2ILF数据集上评估了我们的方法和17种主流检测模型。实验结果表明,D2GPLand+在这三种数据集上都取得了优异的性能。我们的方法为外科医生在复杂的腹腔镜手术中方便手术操作和决策提供了指导线索。我们的代码和数据集可在https://github.com/cuiruize/D2GPLand-Plus上获得。
{"title":"Depth-induced prompt learning for laparoscopic liver landmark detection","authors":"Ruize Cui ,&nbsp;Weixin Si ,&nbsp;Zhixi Li ,&nbsp;Kai Wang ,&nbsp;Jialun Pei ,&nbsp;Pheng-Ann Heng ,&nbsp;Jing Qin","doi":"10.1016/j.media.2026.103940","DOIUrl":"10.1016/j.media.2026.103940","url":null,"abstract":"<div><div>Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called <em>L3D-2K</em>, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D<sup>2</sup>GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D<sup>2</sup>GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at <span><span>https://github.com/cuiruize/D2GPLand-Plus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103940"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge 内镜手术阶段识别、仪器关键点估计和仪器实例分割的比较验证:PhaKIR 2024挑战的结果
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103945
Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm
Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.
To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.
We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.
在内镜录像中对手术器械进行可靠的识别和定位是计算机和机器人辅助微创手术(RAMIS)广泛应用的基础,包括手术培训、技能评估和自主辅助。然而,在现实条件下的强大性能仍然是一个重大挑战。结合手术背景-如当前的手术阶段-已成为一种有希望的策略,以提高稳健性和可解释性。为了应对这些挑战,我们组织了外科手术阶段、关键点和仪器识别(PhaKIR)子挑战,作为MICCAI 2024内窥镜视觉(EndoVis)挑战的一部分。我们引入了一个新的、多中心的数据集,包括从三个不同的医疗机构收集的13个全长腹腔镜胆囊切除术视频,并对三个相互关联的任务进行了统一的注释:手术阶段识别、仪器关键点估计和仪器实例分割。与现有的数据集不同,我们的数据集可以在同一数据中对仪器定位和程序背景进行联合调查,同时支持跨整个过程的时间信息集成。我们根据生物医学图像分析挑战的BIAS指南报告结果和发现。PhaKIR子挑战通过为RAMIS中开发时间感知、上下文驱动的方法提供独特的基准来推进该领域,并为支持未来手术场景理解的研究提供高质量的资源。
{"title":"Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge","authors":"Tobias Rueckert ,&nbsp;David Rauber ,&nbsp;Raphaela Maerkl ,&nbsp;Leonard Klausmann ,&nbsp;Suemeyye R. Yildiran ,&nbsp;Max Gutbrod ,&nbsp;Danilo Weber Nunes ,&nbsp;Alvaro Fernandez Moreno ,&nbsp;Imanol Luengo ,&nbsp;Danail Stoyanov ,&nbsp;Nicolas Toussaint ,&nbsp;Enki Cho ,&nbsp;Hyeon Bae Kim ,&nbsp;Oh Sung Choo ,&nbsp;Ka Young Kim ,&nbsp;Seong Tae Kim ,&nbsp;Gonçalo Arantes ,&nbsp;Kehan Song ,&nbsp;Jianjun Zhu ,&nbsp;Junchen Xiong ,&nbsp;Christoph Palm","doi":"10.1016/j.media.2026.103945","DOIUrl":"10.1016/j.media.2026.103945","url":null,"abstract":"<div><div>Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.</div><div>To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.</div><div>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103945"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1