Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge—often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion–contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.
{"title":"Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models","authors":"Dongyuan Li, Yixin Shan, Yuxuan Mao, Puxun Tu, Haochen Shi, Shenghao Huang, Weiyan Sun, Chang Chen, Xiaojun Chen","doi":"10.1016/j.media.2026.103948","DOIUrl":"https://doi.org/10.1016/j.media.2026.103948","url":null,"abstract":"Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge—often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion–contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"11 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.
{"title":"C2HFusion: Clinical context-driven hierarchical fusion of multimodal data for personalized and quantitative prognostic assessment in pancreatic cancer","authors":"Bolun Zeng, Yaolin Xu, Peng Wang, Tianyu Lu, Zongyu Xie, Mengsu Zeng, Jianjun Zhou, Liang Liu, Haitao Sun, Xiaojun Chen","doi":"10.1016/j.media.2026.103937","DOIUrl":"https://doi.org/10.1016/j.media.2026.103937","url":null,"abstract":"Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"347 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.media.2026.103934
Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan
Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.
{"title":"Facial appearance prediction for orthognathic surgery with diffusion models","authors":"Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan","doi":"10.1016/j.media.2026.103934","DOIUrl":"10.1016/j.media.2026.103934","url":null,"abstract":"<div><div>Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103934"},"PeriodicalIF":11.8,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.media.2026.103938
Xudong Guo , Peiyu Chen , Haifeng Wang , Zhichao Yan , Qinfen Jiang , Rongjiang Wang , Ji Bin
Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at https://github.com/Prps7/UTMorph.
{"title":"UTMorph: A hybrid CNN-transformer network for weakly-supervised multimodal image registration in biopsy puncture","authors":"Xudong Guo , Peiyu Chen , Haifeng Wang , Zhichao Yan , Qinfen Jiang , Rongjiang Wang , Ji Bin","doi":"10.1016/j.media.2026.103938","DOIUrl":"10.1016/j.media.2026.103938","url":null,"abstract":"<div><div>Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at <span><span>https://github.com/Prps7/UTMorph</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103938"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.
{"title":"MegaSeg: Towards scalable semantic segmentation for megapixel images","authors":"Solomon Kefas Kaura , Jialun Wu , Zeyu Gao , Chen Li","doi":"10.1016/j.media.2026.103933","DOIUrl":"10.1016/j.media.2026.103933","url":null,"abstract":"<div><div>Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103933"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.media.2026.103944
Haizhou Liu , Xueling Qin , Zhou Liu , Yuxi Jin , Heng Jiang , Yunlong Gao , Jidong Han , Yijia Zheng , Heng Sun , Lingtao Mao , François Hild , Hairong Zheng , Dong Liang , Na Zhang , Jiuping Liang , Dehong Luo , Zhanli Hu
Accurate and biomechanically consistent quantification of cardiac motion remains a major challenge in cine MRI analysis. While classical feature-tracking and recent deep learning methods have improved frame-wise strain estimation, they often lack biomechanical interpretability and temporal coherence. In this study, we propose a spacetime-regularized finite-element digital image/volume correlation (FE-DIC/DVC) framework that enables 2D/3D+T myocardial motion tracking and strain analysis using only routine cine MRI. The method unifies Multiview alignment and 2D/3D+T motion estimation into a coherent pipeline, combining region-specific biomechanical regularization with data-driven based temporal decomposition to promote spatial fidelity and temporal consistency. A correlation-based Multiview alignment module further enhances anatomical consistency across short- and long-axis views. We evaluate the approach on one synthetic dataset (with ground-truth motion and strain fields), three public datasets (with ground-truth landmarks or myocardial masks), and a clinical dataset (with ground-truth myocardial masks). 2D+T motion and strain are evaluated across all datasets, whereas Multiview alignment and 3D+T motion estimation is assessed only on the clinical dataset. Compared with two classical feature-tracking methods and four state-of-the-art deep-learning baselines, the proposed method improves 2D+T motion and strain estimation accuracy as well as temporal consistency on the synthetic data, achieving a displacement RMSE of 0.35 pixels (vs. 0.73 pixels), an equivalent-strain RMSE of 0.05 (vs. 0.097), and a temporal consistency of 0.97 (vs. 0.91). On public and clinical data, it achieves superior performance in terms of a landmark error of 1.96 mm (vs. 3.15 mm), a boundary-tracking Dice of 0.80–0.87 (a 2–4% improvement over the best-performing baseline), and overall registration quality that consistently ranks among the top two methods. By leveraging only standard cine MRI, this work enables 2D/3D+T myocardial mechanics and provides a practical route toward 4D cardiac function assessment.
{"title":"Unlocking 2D/3D+T myocardial mechanics from cine MRI: a mechanically regularized space-time finite element correlation framework","authors":"Haizhou Liu , Xueling Qin , Zhou Liu , Yuxi Jin , Heng Jiang , Yunlong Gao , Jidong Han , Yijia Zheng , Heng Sun , Lingtao Mao , François Hild , Hairong Zheng , Dong Liang , Na Zhang , Jiuping Liang , Dehong Luo , Zhanli Hu","doi":"10.1016/j.media.2026.103944","DOIUrl":"10.1016/j.media.2026.103944","url":null,"abstract":"<div><div>Accurate and biomechanically consistent quantification of cardiac motion remains a major challenge in cine MRI analysis. While classical feature-tracking and recent deep learning methods have improved frame-wise strain estimation, they often lack biomechanical interpretability and temporal coherence. In this study, we propose a spacetime-regularized finite-element digital image/volume correlation (FE-DIC/DVC) framework that enables 2D/3D+T myocardial motion tracking and strain analysis using only routine cine MRI. The method unifies Multiview alignment and 2D/3D+T motion estimation into a coherent pipeline, combining region-specific biomechanical regularization with data-driven based temporal decomposition to promote spatial fidelity and temporal consistency. A correlation-based Multiview alignment module further enhances anatomical consistency across short- and long-axis views. We evaluate the approach on one synthetic dataset (with ground-truth motion and strain fields), three public datasets (with ground-truth landmarks or myocardial masks), and a clinical dataset (with ground-truth myocardial masks). 2D+T motion and strain are evaluated across all datasets, whereas Multiview alignment and 3D+T motion estimation is assessed only on the clinical dataset. Compared with two classical feature-tracking methods and four state-of-the-art deep-learning baselines, the proposed method improves 2D+T motion and strain estimation accuracy as well as temporal consistency on the synthetic data, achieving a displacement RMSE of 0.35 pixels (vs. 0.73 pixels), an equivalent-strain RMSE of 0.05 (vs. 0.097), and a temporal consistency of 0.97 (vs. 0.91). On public and clinical data, it achieves superior performance in terms of a landmark error of 1.96 mm (vs. 3.15 mm), a boundary-tracking Dice of 0.80–0.87 (a 2–4% improvement over the best-performing baseline), and overall registration quality that consistently ranks among the top two methods. By leveraging only standard cine MRI, this work enables 2D/3D+T myocardial mechanics and provides a practical route toward 4D cardiac function assessment.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103944"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.media.2026.103936
Jun Lyu, Guangming Wang, Yunqi Wang, Chengyan Wang, Jing Qin
Magnetic resonance imaging (MRI) plays a crucial role in clinical diagnosis, yet traditional MR image acquisition often requires a prolonged duration, potentially causing patient discomfort and image artifacts. Faster and more accurate image reconstruction may alleviate patient discomfort during MRI examinations and enhance diagnostic accuracy and efficiency. In recent years, significant advancements in deep learning technology offer promise for improving MR image quality and accelerating acquisition. Addressing the demand for cardiac cine MRI reconstruction, we propose KGMgT, a novel MRI reconstruction network based on knowledge-guided approaches. The KGMgT model leverages adaptive spatiotemporal attention mechanisms to infer motion trajectories of adjacent cardiac frames, thereby better extracting complementary information. Additionally, we employ Transformer-driven dynamic feature aggregation to establish long-range dependencies, facilitating global information integration. Research findings demonstrate that the KGMgT model achieves state-of-the-art performance on multiple benchmark datasets, offering an efficient solution for cardiac cine MRI reconstruction. This collaborative approach, combining artificial intelligence technology to assist medical professionals in clinical decision-making, holds promise for significantly improving diagnostic efficiency, optimizing treatment plans, and enhancing the patient treatment experience. The code and trained models are available at https://github.com/MICV-Lab/KGMgT.
{"title":"Knowledge-guided multi-geometric window transformer for cardiac cine MRI reconstruction","authors":"Jun Lyu, Guangming Wang, Yunqi Wang, Chengyan Wang, Jing Qin","doi":"10.1016/j.media.2026.103936","DOIUrl":"https://doi.org/10.1016/j.media.2026.103936","url":null,"abstract":"Magnetic resonance imaging (MRI) plays a crucial role in clinical diagnosis, yet traditional MR image acquisition often requires a prolonged duration, potentially causing patient discomfort and image artifacts. Faster and more accurate image reconstruction may alleviate patient discomfort during MRI examinations and enhance diagnostic accuracy and efficiency. In recent years, significant advancements in deep learning technology offer promise for improving MR image quality and accelerating acquisition. Addressing the demand for cardiac cine MRI reconstruction, we propose KGMgT, a novel MRI reconstruction network based on knowledge-guided approaches. The KGMgT model leverages adaptive spatiotemporal attention mechanisms to infer motion trajectories of adjacent cardiac frames, thereby better extracting complementary information. Additionally, we employ Transformer-driven dynamic feature aggregation to establish long-range dependencies, facilitating global information integration. Research findings demonstrate that the KGMgT model achieves state-of-the-art performance on multiple benchmark datasets, offering an efficient solution for cardiac cine MRI reconstruction. This collaborative approach, combining artificial intelligence technology to assist medical professionals in clinical decision-making, holds promise for significantly improving diagnostic efficiency, optimizing treatment plans, and enhancing the patient treatment experience. The code and trained models are available at <ce:inter-ref xlink:href=\"https://github.com/MICV-Lab/KGMgT\" xlink:type=\"simple\">https://github.com/MICV-Lab/KGMgT</ce:inter-ref>.","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"54 1","pages":""},"PeriodicalIF":10.9,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}