Pub Date : 2026-01-16DOI: 10.1016/j.media.2026.103950
Gelei Xu , Yuying Duan , Jun Xia , Ching-Hao Chiu , Michael Lemmon , Wei Jin , Yiyu Shi
Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.
{"title":"Rethinking fairness in medical imaging: Maximizing group-specific performance with application to skin disease diagnosis","authors":"Gelei Xu , Yuying Duan , Jun Xia , Ching-Hao Chiu , Michael Lemmon , Wei Jin , Yiyu Shi","doi":"10.1016/j.media.2026.103950","DOIUrl":"10.1016/j.media.2026.103950","url":null,"abstract":"<div><div>Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103950"},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.media.2025.103886
Lu Zhang , Huizhen Yu , Zuowei Wang , Fu Gui , Yatu Guo , Wei Zhang , Mengyu Jia
Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
{"title":"Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis","authors":"Lu Zhang , Huizhen Yu , Zuowei Wang , Fu Gui , Yatu Guo , Wei Zhang , Mengyu Jia","doi":"10.1016/j.media.2025.103886","DOIUrl":"10.1016/j.media.2025.103886","url":null,"abstract":"<div><div>Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103886"},"PeriodicalIF":11.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.media.2026.103942
Weiyi Zhang , Peranut Chotcomwongse , Yinwen Li , Pusheng Xu , Ruijie Yao , Lianhao Zhou , Yuxuan Zhou , Hui Feng , Qiping Zhou , Xinyue Wang , Shoujin Huang , Zihao Jin , Florence H T Chung , Shujun Wang , Yalin Zheng , Mingguang He , Danli Shi , Paisan Ruamviboonsuk
Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.
{"title":"Predicting diabetic macular edema treatment responses using OCT: Dataset and methods of APTOS competition","authors":"Weiyi Zhang , Peranut Chotcomwongse , Yinwen Li , Pusheng Xu , Ruijie Yao , Lianhao Zhou , Yuxuan Zhou , Hui Feng , Qiping Zhou , Xinyue Wang , Shoujin Huang , Zihao Jin , Florence H T Chung , Shujun Wang , Yalin Zheng , Mingguang He , Danli Shi , Paisan Ruamviboonsuk","doi":"10.1016/j.media.2026.103942","DOIUrl":"10.1016/j.media.2026.103942","url":null,"abstract":"<div><div>Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103942"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.media.2026.103940
Ruize Cui , Weixin Si , Zhixi Li , Kai Wang , Jialun Pei , Pheng-Ann Heng , Jing Qin
Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called L3D-2K, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D2GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D2GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at https://github.com/cuiruize/D2GPLand-Plus.
{"title":"Depth-induced prompt learning for laparoscopic liver landmark detection","authors":"Ruize Cui , Weixin Si , Zhixi Li , Kai Wang , Jialun Pei , Pheng-Ann Heng , Jing Qin","doi":"10.1016/j.media.2026.103940","DOIUrl":"10.1016/j.media.2026.103940","url":null,"abstract":"<div><div>Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called <em>L3D-2K</em>, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D<sup>2</sup>GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D<sup>2</sup>GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at <span><span>https://github.com/cuiruize/D2GPLand-Plus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103940"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.media.2026.103945
Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm
Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.
To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.
We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.
{"title":"Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge","authors":"Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm","doi":"10.1016/j.media.2026.103945","DOIUrl":"10.1016/j.media.2026.103945","url":null,"abstract":"<div><div>Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.</div><div>To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.</div><div>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103945"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.media.2026.103948
Dongyuan Li , Yixin Shan , Yuxuan Mao , Puxun Tu , Haochen Shi , Shenghao Huang , Weiyan Sun , Chang Chen , Xiaojun Chen
Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.
{"title":"Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models","authors":"Dongyuan Li , Yixin Shan , Yuxuan Mao , Puxun Tu , Haochen Shi , Shenghao Huang , Weiyan Sun , Chang Chen , Xiaojun Chen","doi":"10.1016/j.media.2026.103948","DOIUrl":"10.1016/j.media.2026.103948","url":null,"abstract":"<div><div>Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103948"},"PeriodicalIF":11.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.media.2026.103937
Bolun Zeng , Yaolin Xu , Peng Wang , Tianyu Lu , Zongyu Xie , Mengsu Zeng , Jianjun Zhou , Liang Liu , Haitao Sun , Xiaojun Chen
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.
{"title":"C2HFusion: Clinical context-driven hierarchical fusion of multimodal data for personalized and quantitative prognostic assessment in pancreatic cancer","authors":"Bolun Zeng , Yaolin Xu , Peng Wang , Tianyu Lu , Zongyu Xie , Mengsu Zeng , Jianjun Zhou , Liang Liu , Haitao Sun , Xiaojun Chen","doi":"10.1016/j.media.2026.103937","DOIUrl":"10.1016/j.media.2026.103937","url":null,"abstract":"<div><div>Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103937"},"PeriodicalIF":11.8,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.media.2026.103934
Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan
Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.
{"title":"Facial appearance prediction for orthognathic surgery with diffusion models","authors":"Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan","doi":"10.1016/j.media.2026.103934","DOIUrl":"10.1016/j.media.2026.103934","url":null,"abstract":"<div><div>Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103934"},"PeriodicalIF":11.8,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.media.2026.103938
Xudong Guo , Peiyu Chen , Haifeng Wang , Zhichao Yan , Qinfen Jiang , Rongjiang Wang , Ji Bin
Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at https://github.com/Prps7/UTMorph.
{"title":"UTMorph: A hybrid CNN-transformer network for weakly-supervised multimodal image registration in biopsy puncture","authors":"Xudong Guo , Peiyu Chen , Haifeng Wang , Zhichao Yan , Qinfen Jiang , Rongjiang Wang , Ji Bin","doi":"10.1016/j.media.2026.103938","DOIUrl":"10.1016/j.media.2026.103938","url":null,"abstract":"<div><div>Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at <span><span>https://github.com/Prps7/UTMorph</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103938"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.
{"title":"MegaSeg: Towards scalable semantic segmentation for megapixel images","authors":"Solomon Kefas Kaura , Jialun Wu , Zeyu Gao , Chen Li","doi":"10.1016/j.media.2026.103933","DOIUrl":"10.1016/j.media.2026.103933","url":null,"abstract":"<div><div>Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103933"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}