Pub Date : 2026-01-27DOI: 10.1016/j.media.2026.103953
Jef Jonkers , Frank Coopman , Luc Duchateau , Glenn Van Wallendael , Sofie Van Hoecke
Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: multi-output regression-as-classification conformal prediction (M-R2CCP) and its variant multi-output regression to classification conformal prediction set to region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.
{"title":"Reliable uncertainty quantification for 2D/3D anatomical landmark localization using multi-output conformal prediction","authors":"Jef Jonkers , Frank Coopman , Luc Duchateau , Glenn Van Wallendael , Sofie Van Hoecke","doi":"10.1016/j.media.2026.103953","DOIUrl":"10.1016/j.media.2026.103953","url":null,"abstract":"<div><div>Automatic anatomical landmark localization in medical imaging requires not just accurate predictions but reliable uncertainty quantification for effective clinical decision support. Current uncertainty quantification approaches often fall short, particularly when combined with normality assumptions, systematically underestimating total predictive uncertainty. This paper introduces conformal prediction as a framework for reliable uncertainty quantification in anatomical landmark localization, addressing a critical gap in automatic landmark localization. We present two novel approaches guaranteeing finite-sample validity for multi-output prediction: multi-output regression-as-classification conformal prediction (M-R2CCP) and its variant multi-output regression to classification conformal prediction set to region (M-R2C2R). Unlike conventional methods that produce axis-aligned hyperrectangular or ellipsoidal regions, our approaches generate flexible, non-convex prediction regions that better capture the underlying uncertainty structure of landmark predictions. Through extensive empirical evaluation across multiple 2D and 3D datasets, we demonstrate that our methods consistently outperform existing multi-output conformal prediction approaches in both validity and efficiency. This work represents a significant advancement in reliable uncertainty estimation for anatomical landmark localization, providing clinicians with trustworthy confidence measures for their diagnoses. While developed for medical imaging, these methods show promise for broader applications in multi-output regression problems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103953"},"PeriodicalIF":11.8,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1016/j.media.2026.103964
Pablo Meseguer , Rocío del Amor , Valery Naranjo
Contrastive language-image pretraining has greatly enhanced visual representation learning and enabled zero-shot classification. Vision-language language models (VLM) have succeeded in few-shot learning by leveraging adaptation modules fine-tuned for specific downstream tasks. In computational pathology (CPath), accurate whole-slide image (WSI) prediction is crucial for aiding in cancer diagnosis, and multiple instance learning (MIL) remains essential for managing the gigapixel scale of WSIs. In the intersection between CPath and VLMs, the literature still lacks specific adapters that handle the particular complexity of the slides. To solve this gap, we introduce MIL-Adapter, a novel approach designed to obtain consistent slide-level classification under few-shot learning scenarios. In particular, our framework is the first to combine trainable MIL aggregation functions and lightweight visual-language adapters to improve the performance of histopathological VLMs. MIL-Adapter relies on textual ensemble learning to construct discriminative zero-shot prototypes. It is serves as a solid starting point, surpassing MIL models with randomly initialized classifiers in data-constrained settings. With our experimentation, we demonstrate the value of textual ensemble learning and the robust predictive performance of MIL-Adapter through diverse datasets and configurations of few-shot scenarios, while providing crucial insights on model interpretability. The code is publicly accessible in https://github.com/cvblab/MIL-Adapter.
{"title":"MIL-Adapter: Coupling multiple instance learning and vision-language adapters for few-shot slide-level classification","authors":"Pablo Meseguer , Rocío del Amor , Valery Naranjo","doi":"10.1016/j.media.2026.103964","DOIUrl":"10.1016/j.media.2026.103964","url":null,"abstract":"<div><div>Contrastive language-image pretraining has greatly enhanced visual representation learning and enabled zero-shot classification. Vision-language language models (VLM) have succeeded in few-shot learning by leveraging adaptation modules fine-tuned for specific downstream tasks. In computational pathology (CPath), accurate whole-slide image (WSI) prediction is crucial for aiding in cancer diagnosis, and multiple instance learning (MIL) remains essential for managing the gigapixel scale of WSIs. In the intersection between CPath and VLMs, the literature still lacks specific adapters that handle the particular complexity of the slides. To solve this gap, we introduce MIL-Adapter, a novel approach designed to obtain consistent slide-level classification under few-shot learning scenarios. In particular, our framework is the first to combine trainable MIL aggregation functions and lightweight visual-language adapters to improve the performance of histopathological VLMs. MIL-Adapter relies on textual ensemble learning to construct discriminative zero-shot prototypes. It is serves as a solid starting point, surpassing MIL models with randomly initialized classifiers in data-constrained settings. With our experimentation, we demonstrate the value of textual ensemble learning and the robust predictive performance of MIL-Adapter through diverse datasets and configurations of few-shot scenarios, while providing crucial insights on model interpretability. The code is publicly accessible in <span><span>https://github.com/cvblab/MIL-Adapter</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103964"},"PeriodicalIF":11.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-25DOI: 10.1016/j.media.2026.103961
Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Junyang Wu, Yi Yu, Jie Yang, Yun Gu
Accurate boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention. It is challenging to address the boundary confusion with explicit constraints. Existing methods of refining boundaries overemphasize the slender structure while overlooking the dynamic interactions between boundaries and neighboring regions. In this paper, we reconceptualize the mechanism of boundary generation via introducing Pushing and Pulling interactions, then propose a unified network termed PP-Net to model shape characteristics of the confused boundary region. Specifically, we first propose the semantic difference module (SDM) from the pushing branch to drive the boundary towards the ground truth under diffusion guidance. Additionally, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary along the opposite direction. Thus, pushing and pulling branches will furnish two adversarial forces to enhance representation capabilities for the faint boundary. Experiments are conducted on four public datasets and one in-house dataset plagued by boundary confusion. The results demonstrate the superiority of PP-Net over other segmentation networks, especially on the evaluation metrics of Hausdorff Distance and Average Symmetric Surface Distance. Besides, SDM and CCM can serve as plug-and-play modules to enhance classic U-shape baseline models, including recent SAM-based foundation models. Source codes are available at https://github.com/EndoluminalSurgicalVision-IMR/PnPNet.
{"title":"Towards Boundary Confusion for Volumetric Medical Image Segmentation","authors":"Xin You, Ming Ding, Minghui Zhang, Hanxiao Zhang, Junyang Wu, Yi Yu, Jie Yang, Yun Gu","doi":"10.1016/j.media.2026.103961","DOIUrl":"https://doi.org/10.1016/j.media.2026.103961","url":null,"abstract":"Accurate boundary segmentation of volumetric images is a critical task for image-guided diagnosis and computer-assisted intervention. It is challenging to address the boundary confusion with explicit constraints. Existing methods of refining boundaries overemphasize the slender structure while overlooking the dynamic interactions between boundaries and neighboring regions. In this paper, we reconceptualize the mechanism of boundary generation via introducing Pushing and Pulling interactions, then propose a unified network termed PP-Net to model shape characteristics of the confused boundary region. Specifically, we first propose the semantic difference module (SDM) from the pushing branch to drive the boundary towards the ground truth under diffusion guidance. Additionally, the class clustering module (CCM) from the pulling branch is introduced to stretch the intersected boundary along the opposite direction. Thus, pushing and pulling branches will furnish two adversarial forces to enhance representation capabilities for the faint boundary. Experiments are conducted on four public datasets and one in-house dataset plagued by boundary confusion. The results demonstrate the superiority of PP-Net over other segmentation networks, especially on the evaluation metrics of Hausdorff Distance and Average Symmetric Surface Distance. Besides, SDM and CCM can serve as plug-and-play modules to enhance classic U-shape baseline models, including recent SAM-based foundation models. Source codes are available at <ce:inter-ref xlink:href=\"https://github.com/EndoluminalSurgicalVision-IMR/PnPNet\" xlink:type=\"simple\">https://github.com/EndoluminalSurgicalVision-IMR/PnPNet</ce:inter-ref>.","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"291 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.media.2026.103959
Donghan Wu , Wenyue Shen , Lu Yuan , Heng Li , Huaying Hao , Juan Ye , Yitian Zhao
Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.
{"title":"Fundus image quality assessment in retinopathy of prematurity via multi-label graph evidential network","authors":"Donghan Wu , Wenyue Shen , Lu Yuan , Heng Li , Huaying Hao , Juan Ye , Yitian Zhao","doi":"10.1016/j.media.2026.103959","DOIUrl":"10.1016/j.media.2026.103959","url":null,"abstract":"<div><div>Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103959"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.media.2026.103943
Nicholas Konz , Richard Osuala , Preeti Verma , Yuwen Chen , Hanxue Gu , Haoyu Dong , Yaqian Chen , Andrew Marshall , Lidia Garrucho , Kaisar Kushibar , Daniel M. Lang , Gene S. Kim , Lars J. Grimm , John M. Lewin , James S. Duncan , Julia A. Schnabel , Oliver Diaz , Karim Lekadir , Maciej A. Mazurowski
Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging—including the first large-scale comparative study of generative models for medical image translation—and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.
{"title":"Fréchet radiomic distance (FRD): A versatile metric for comparing medical imaging datasets","authors":"Nicholas Konz , Richard Osuala , Preeti Verma , Yuwen Chen , Hanxue Gu , Haoyu Dong , Yaqian Chen , Andrew Marshall , Lidia Garrucho , Kaisar Kushibar , Daniel M. Lang , Gene S. Kim , Lars J. Grimm , John M. Lewin , James S. Duncan , Julia A. Schnabel , Oliver Diaz , Karim Lekadir , Maciej A. Mazurowski","doi":"10.1016/j.media.2026.103943","DOIUrl":"10.1016/j.media.2026.103943","url":null,"abstract":"<div><div>Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (<em>e.g.</em>, Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging—including the first large-scale comparative study of generative models for medical image translation—and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103943"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.media.2026.103958
Jianbin He , Guoheng Huang , Xiaochen Yuan , Chi-Man Pun , Guo Zhong , Qi Yang , Ling Guo , Siyu Zhu , Baiying Lei , Haojiang Li
Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.
{"title":"MADAT: Missing-aware dynamic adaptive transformer model for medical prognosis prediction with incomplete multimodal data","authors":"Jianbin He , Guoheng Huang , Xiaochen Yuan , Chi-Man Pun , Guo Zhong , Qi Yang , Ling Guo , Siyu Zhu , Baiying Lei , Haojiang Li","doi":"10.1016/j.media.2026.103958","DOIUrl":"10.1016/j.media.2026.103958","url":null,"abstract":"<div><div>Multimodal medical prognosis prediction has shown great potential in improving diagnostic accuracy by integrating various data types. However, incomplete multimodality, where certain modalities are missing, poses significant challenges to model performance. Current methods, including dynamic adaptation and modality completion, have limitations in handling incomplete multimodality comprehensively. Dynamic adaptation methods fail to fully utilize modality interactions as they only process available modalities. Modality completion methods address inter-modal relationships but risk generating unreliable data, especially when key modalities are missing, since existing modalities cannot replicate unique features of absent ones. This compromises fusion quality and degrades model performance. To address these challenges, we propose the Missing-aware Dynamic Adaptive Transformer (MADAT) model, which integrates two phases: the Decoupling Generalization Completion Phase (DGCP), the Adaptive Cross-Fusion Phase (ACFP). The DGCP reconstructs missing modalities by generating inter-modal and intra-modal shared information using Progressive Transformation Recursive Gated Convolutions (PTRGC) and Wavelet Alignment Domain Generalization (WADG). The ACFP, which incorporates Cross-Agent Attention (CAA) and Generation Quality Feedback Regulation (GQFR), adaptively fuses the original and generated modality features. CAA ensures thorough integration and alignment of the features, while GQFR dynamically adjusts the model’s reliance on the generated features based on their quality, preventing over-dependence on low-quality data. Experiments on three private nasopharyngeal carcinoma datasets demonstrate that MADAT outperforms existing methods, achieving superior robustness in medical multimodal prediction under conditions of incomplete multimodality.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103958"},"PeriodicalIF":11.8,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.media.2026.103960
Jieyun Bai , Yitong Tang , Xiao Liu , Jiale Hu , Yunda Li , Xufan Chen , Yufeng Wang , Chen Ma , Yunshu Li , Bowen Guo , Jing Jiao , Yi Huang , Kun Wang , Lifei Li , Yuzhang Ma , Xiaoxin Han , Haochen Shao , Zi Yang , Qingchen Liu , Yuchen Hu , Shuo Li
Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.
{"title":"IUGC: A benchmark of landmark detection in end-to-end intrapartum ultrasound biometry","authors":"Jieyun Bai , Yitong Tang , Xiao Liu , Jiale Hu , Yunda Li , Xufan Chen , Yufeng Wang , Chen Ma , Yunshu Li , Bowen Guo , Jing Jiao , Yi Huang , Kun Wang , Lifei Li , Yuzhang Ma , Xiaoxin Han , Haochen Shao , Zi Yang , Qingchen Liu , Yuchen Hu , Shuo Li","doi":"10.1016/j.media.2026.103960","DOIUrl":"10.1016/j.media.2026.103960","url":null,"abstract":"<div><div>Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103960"},"PeriodicalIF":11.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.media.2026.103947
Rosanna Turrisi, Giuseppe Patané
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.
阿尔茨海默病(AD)是一种进行性神经退行性疾病,也是痴呆症的主要原因。磁共振成像(MRI)与机器学习(ML)相结合可以实现早期诊断,但ML模型在小型异构医疗数据集上训练时往往表现不佳。迁移学习(TL)有助于缓解这一限制,但在2D自然图像上预训练的模型仍然不如直接在相关3D MRI数据上训练的模型。为了解决这一差距,我们引入了一种基于合成数据生成的中间策略。具体来说,我们提出了一个条件去噪扩散概率模型(DDPM)来合成三个临床组的脑MRI扫描的二维投影(轴向、冠状、矢状):认知正常(CN)、轻度认知障碍(MCI)和AD。总共有9000张合成图像用于预训练2D模型,随后通过轴向、冠状和矢状卷积扩展到3D,并在现实世界的小数据集上进行微调。在3T ADNI数据集上,我们的方法在二元分类(CN vs. AD)和三类分类(CN/ mci /AD)上的准确率分别达到91.3%和74.5%,优于从头训练的模型和在ImageNet上预训练的模型。我们的2D ADnet在OASIS-2上取得了最先进的性能(59.3%的准确率,57.6%的F1),超过了所有竞争对手的模型,并证实了合成数据预训练的鲁棒性。这些结果表明,基于合成扩散的预训练是连接自然图像TL和医学MRI数据的一个很有前途的桥梁。
{"title":"Generating synthetic MRI scans for improving Alzheimer’s disease diagnosis","authors":"Rosanna Turrisi, Giuseppe Patané","doi":"10.1016/j.media.2026.103947","DOIUrl":"10.1016/j.media.2026.103947","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103947"},"PeriodicalIF":11.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.media.2026.103956
Jiaju Huang , Xiao Yang , Xinglong Liang , Shaobin Chen , Yue Sun , Greta Sp Mok , Shuo Li , Ying Wang , Tao Tan
Accurate segmentation of breast cancer in PET-CT images is crucial for precise staging, monitoring treatment response, and guiding personalized therapy. However, the small size and dispersed nature of metastatic lesions, coupled with the scarcity of annotated data and heterogeneity between modalities that hinders effective information fusion, make this task challenging. This paper proposes a novel anatomy-guided cross-modal learning framework to address these issues. Our approach first generates organ pseudo-labels through a teacher-student learning paradigm, which serve as anatomical prompts to guide cancer segmentation. We then introduce a self-aligning cross-modal pre-training method that aligns PET and CT features in a shared latent space through masked 3D patch reconstruction, enabling effective cross-modal feature fusion. Finally, we initialize the segmentation network’s encoder with the pre-trained encoder weights, and incorporate organ labels through a Mamba-based prompt encoder and Hypernet-Controlled Cross-Attention mechanism for dynamic anatomical feature extraction and fusion. Notably, our method outperforms eight state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches, on two datasets encompassing primary breast cancer, metastatic breast cancer, and other types of cancer segmentation tasks.
{"title":"Anatomy-guided prompting with cross-modal self-alignment for whole-body PET-CT breast cancer segmentation","authors":"Jiaju Huang , Xiao Yang , Xinglong Liang , Shaobin Chen , Yue Sun , Greta Sp Mok , Shuo Li , Ying Wang , Tao Tan","doi":"10.1016/j.media.2026.103956","DOIUrl":"10.1016/j.media.2026.103956","url":null,"abstract":"<div><div>Accurate segmentation of breast cancer in PET-CT images is crucial for precise staging, monitoring treatment response, and guiding personalized therapy. However, the small size and dispersed nature of metastatic lesions, coupled with the scarcity of annotated data and heterogeneity between modalities that hinders effective information fusion, make this task challenging. This paper proposes a novel anatomy-guided cross-modal learning framework to address these issues. Our approach first generates organ pseudo-labels through a teacher-student learning paradigm, which serve as anatomical prompts to guide cancer segmentation. We then introduce a self-aligning cross-modal pre-training method that aligns PET and CT features in a shared latent space through masked 3D patch reconstruction, enabling effective cross-modal feature fusion. Finally, we initialize the segmentation network’s encoder with the pre-trained encoder weights, and incorporate organ labels through a Mamba-based prompt encoder and Hypernet-Controlled Cross-Attention mechanism for dynamic anatomical feature extraction and fusion. Notably, our method outperforms eight state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches, on two datasets encompassing primary breast cancer, metastatic breast cancer, and other types of cancer segmentation tasks.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103956"},"PeriodicalIF":11.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}