Surgical phase recognition (SPR) is essential for surgical workflow analysis and provides immediate guidance during procedures. Existing methods aggregate frame-level information into a global representation and treat the task as frame-wise classification. However, this pipeline lacks a feedback mechanism for integrating historical information into local temporal modeling. To address this limitation, we propose the Bidirectional Branch Query Network (B2Q-Net), which reformulates the SPR task as the bidirectional query between phase-level features and frame-level features. B2Q-Net incorporates historical information during the initialization of phase queries. This enables bidirectional information flow during iterative refinement of two-level feature maps between phases and frames. Furthermore, we introduce a dual-scale selector (DSS) to generate high-quality phase queries for the current video clip. These phase queries retrieve historical information from the proposed state space query (SSQ) module, which uses learnable tokens as the historical state space to preserve historical information. Extensive evaluations on three datasets demonstrate that B2Q-Net consistently outperforms state-of-the-art methods in recognition accuracy while achieving an inference speed of 106 fps. The B2Q-Net code is available at https://github.com/vsislab/B2Q-Net.
{"title":"B2Q-Net: Bidirectional Branch Query Network for Surgical Phase Recognition.","authors":"Wenjie Zhang,Zhiheng Li,Yue Bi,Xiao Jia,Ran Song,Yipeng Zhang,Wei Zhang","doi":"10.1109/tmi.2026.3654795","DOIUrl":"https://doi.org/10.1109/tmi.2026.3654795","url":null,"abstract":"Surgical phase recognition (SPR) is essential for surgical workflow analysis and provides immediate guidance during procedures. Existing methods aggregate frame-level information into a global representation and treat the task as frame-wise classification. However, this pipeline lacks a feedback mechanism for integrating historical information into local temporal modeling. To address this limitation, we propose the Bidirectional Branch Query Network (B2Q-Net), which reformulates the SPR task as the bidirectional query between phase-level features and frame-level features. B2Q-Net incorporates historical information during the initialization of phase queries. This enables bidirectional information flow during iterative refinement of two-level feature maps between phases and frames. Furthermore, we introduce a dual-scale selector (DSS) to generate high-quality phase queries for the current video clip. These phase queries retrieve historical information from the proposed state space query (SSQ) module, which uses learnable tokens as the historical state space to preserve historical information. Extensive evaluations on three datasets demonstrate that B2Q-Net consistently outperforms state-of-the-art methods in recognition accuracy while achieving an inference speed of 106 fps. The B2Q-Net code is available at https://github.com/vsislab/B2Q-Net.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1109/tmi.2026.3654249
Tao Song,Yicheng Wu,Minhao Hu,Xiangde Luo,Linda Wei,Guotai Wang,Yi Guo,Feng Xu,Shaoting Zhang
Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature-or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.
{"title":"Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI Synthesis.","authors":"Tao Song,Yicheng Wu,Minhao Hu,Xiangde Luo,Linda Wei,Guotai Wang,Yi Guo,Feng Xu,Shaoting Zhang","doi":"10.1109/tmi.2026.3654249","DOIUrl":"https://doi.org/10.1109/tmi.2026.3654249","url":null,"abstract":"Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature-or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"8 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intraoperative anomalies cause deviations from the ideal surgical workflow, heightening the risk of consequential errors and complications. Their reliable recognition has traditionally relied on continuous surgeon monitoring, yet automated anomaly detection systems are now indispensable for the safe advancement of assistive and autonomous surgery. However, existing approaches struggle with domain shifts across surgical platforms and unpredictable scenarios in deformable surgical environments. To address this, we propose DA-MIST, a Domain Adaptive Multiple Instance Self-Training framework for weakly supervised anomaly detection. DA-MIST adopts a two-stage training strategy that combines multiple instance learning with self-training, enhanced by a scene-decoupled memory mechanism that disentangles state-irrelevant scene variations from memory banks, preserving only state-discriminative features for robust anomaly identification. Additionally, a state-aware dual-branch attention module integrates Gaussian dynamic and global self-attention for effective temporal reasoning. Evaluated on our newly compiled large-scale endoscopic video dataset encompassing seven representative anomalies, DA-MIST demonstrates strong adaptability across heterogeneous surgical domains, consistently reducing false alarms and enhancing anomaly localization accuracy. Our code and dataset will be available at: https://github.com/iamziang/DA-MIST.
{"title":"Domain Adaptive Multiple Instance Self-Training for Intraoperative Anomaly Detection.","authors":"Ziang Chen,Yiming Ding,Jianchang Zhao,Bo Yi,Jianguo Wei","doi":"10.1109/tmi.2026.3654087","DOIUrl":"https://doi.org/10.1109/tmi.2026.3654087","url":null,"abstract":"Intraoperative anomalies cause deviations from the ideal surgical workflow, heightening the risk of consequential errors and complications. Their reliable recognition has traditionally relied on continuous surgeon monitoring, yet automated anomaly detection systems are now indispensable for the safe advancement of assistive and autonomous surgery. However, existing approaches struggle with domain shifts across surgical platforms and unpredictable scenarios in deformable surgical environments. To address this, we propose DA-MIST, a Domain Adaptive Multiple Instance Self-Training framework for weakly supervised anomaly detection. DA-MIST adopts a two-stage training strategy that combines multiple instance learning with self-training, enhanced by a scene-decoupled memory mechanism that disentangles state-irrelevant scene variations from memory banks, preserving only state-discriminative features for robust anomaly identification. Additionally, a state-aware dual-branch attention module integrates Gaussian dynamic and global self-attention for effective temporal reasoning. Evaluated on our newly compiled large-scale endoscopic video dataset encompassing seven representative anomalies, DA-MIST demonstrates strong adaptability across heterogeneous surgical domains, consistently reducing false alarms and enhancing anomaly localization accuracy. Our code and dataset will be available at: https://github.com/iamziang/DA-MIST.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"17 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1109/tmi.2026.3654612
Yuting Chen, Yuxiang Xing, Li Zhang, Zhi Deng, Hewei Gao
{"title":"Energy-Threshold Bias Calculator: A Physics-Model Based Adaptive Correction Scheme for Photon-Counting CT","authors":"Yuting Chen, Yuxiang Xing, Li Zhang, Zhi Deng, Hewei Gao","doi":"10.1109/tmi.2026.3654612","DOIUrl":"https://doi.org/10.1109/tmi.2026.3654612","url":null,"abstract":"","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"84 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1109/tmi.2026.3653974
L. Guo, A. Bialkowski, A. Abbosh
{"title":"Medical Microwave Imaging Using Physics-Guided Deep Learning Part 2: The Inverse Solver","authors":"L. Guo, A. Bialkowski, A. Abbosh","doi":"10.1109/tmi.2026.3653974","DOIUrl":"https://doi.org/10.1109/tmi.2026.3653974","url":null,"abstract":"","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"141 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tmi.2026.3653779
Jiahui Huang,Jiaxin Huang,Mingdu Zhang,Qiong Wang,Xiao-Qing Pei,Ying Hu,Hao Chen,Yan Pang
Multimodal ultrasound imaging, combining B-mode ultrasound, shear wave velocity, and shear wave time, is crucial for diagnosing and treating breast lesions, providing insights into lesion characteristics and tissue properties. However, challenges arise from intermodal feature misalignment and attention shifts due to varied capture methods and an overemphasis on vibrant color data. To tackle these issues, we introduce two innovations: a novel segmentation framework and a comprehensive dataset. The UltraMamba framework utilizes bidirectional alignment between modalities and enhances region-specific information to improve breast lesion segmentation accuracy. Key components include the Cross-Modal Knowledge Interaction module for robust information exchange and the Region-Aware Feature Excitation module to focus on relevant features. We also present the BreLS dataset, the first two-dimensional multimodal ultrasound breast lesion dataset, with paired images from 506 cases, serving as a valuable resource for analysis. UltraMamba shows strong performance on the BreLS dataset, achieving a Dice Similarity Coefficient of 72.16% and an HD95 of 42.02 mm, reflecting improvements of 2.59% in DSC and a 6.78 mm reduction in HD95 compared to the second-best framework, MMCA-NET. These results highlight UltraMamba's potential to enhance segmentation accuracy in clinical settings, facilitating precise treatment planning and, ultimately, leading to improved outcomes. Code: https://github.com/deepang-ai/UltraMamba.
{"title":"UltraMamba: Mamba-based Multimodal Ultrasound Image Adaptive Fusion for Breast Lesion Segmentation.","authors":"Jiahui Huang,Jiaxin Huang,Mingdu Zhang,Qiong Wang,Xiao-Qing Pei,Ying Hu,Hao Chen,Yan Pang","doi":"10.1109/tmi.2026.3653779","DOIUrl":"https://doi.org/10.1109/tmi.2026.3653779","url":null,"abstract":"Multimodal ultrasound imaging, combining B-mode ultrasound, shear wave velocity, and shear wave time, is crucial for diagnosing and treating breast lesions, providing insights into lesion characteristics and tissue properties. However, challenges arise from intermodal feature misalignment and attention shifts due to varied capture methods and an overemphasis on vibrant color data. To tackle these issues, we introduce two innovations: a novel segmentation framework and a comprehensive dataset. The UltraMamba framework utilizes bidirectional alignment between modalities and enhances region-specific information to improve breast lesion segmentation accuracy. Key components include the Cross-Modal Knowledge Interaction module for robust information exchange and the Region-Aware Feature Excitation module to focus on relevant features. We also present the BreLS dataset, the first two-dimensional multimodal ultrasound breast lesion dataset, with paired images from 506 cases, serving as a valuable resource for analysis. UltraMamba shows strong performance on the BreLS dataset, achieving a Dice Similarity Coefficient of 72.16% and an HD95 of 42.02 mm, reflecting improvements of 2.59% in DSC and a 6.78 mm reduction in HD95 compared to the second-best framework, MMCA-NET. These results highlight UltraMamba's potential to enhance segmentation accuracy in clinical settings, facilitating precise treatment planning and, ultimately, leading to improved outcomes. Code: https://github.com/deepang-ai/UltraMamba.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"7 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/tmi.2026.3654000
Jinrong Cui,Weihao Ye,Shengrong Li,Jie Wen,Qi Zhu
Multi-modal learning is extensively applied to diagnose brain diseases such as epilepsy and Alzheimer's disease. However, incomplete multi-modal data, where some modalities are unavailable or difficult to collect, limits the effectiveness of conventional methods. Additionally, existing approaches often overlook semantic relationships between neighbors with the same-label and latent information in missing modalities. To address these challenges, we propose an adjacent-aware distillation recovery framework designed for incomplete multi-modal learning, with a focus on diagnosing representative brain diseases, i.e. epilepsy and Alzheimer's disease. The key novelty of our framework lies in its joint design of adjacent-aware modality recovery and multi-modal representation learning in a single end-to-end pipeline. Specifically, we introduce a label-guided adjacent-aware recovery module that uses a self-attention mechanism to exploit neighbor semantics and generate distribution-consistent features for high-quality modality reconstruction. The recovered features are then refined through a knowledge distillation pathway into a modality generator, enhancing generalization under severe data incompleteness. For multi-modal representation learning, the recovered modality information is fused with the original incomplete information to enhance feature extraction and representation. Extensive experiments demonstrate the effectiveness of our method in diagnosing epilepsy and Alzheimer's disease.
{"title":"Adjacent-aware Modality Recovery based on Incomplete Multi-Modal Brain Disease Diagnosis.","authors":"Jinrong Cui,Weihao Ye,Shengrong Li,Jie Wen,Qi Zhu","doi":"10.1109/tmi.2026.3654000","DOIUrl":"https://doi.org/10.1109/tmi.2026.3654000","url":null,"abstract":"Multi-modal learning is extensively applied to diagnose brain diseases such as epilepsy and Alzheimer's disease. However, incomplete multi-modal data, where some modalities are unavailable or difficult to collect, limits the effectiveness of conventional methods. Additionally, existing approaches often overlook semantic relationships between neighbors with the same-label and latent information in missing modalities. To address these challenges, we propose an adjacent-aware distillation recovery framework designed for incomplete multi-modal learning, with a focus on diagnosing representative brain diseases, i.e. epilepsy and Alzheimer's disease. The key novelty of our framework lies in its joint design of adjacent-aware modality recovery and multi-modal representation learning in a single end-to-end pipeline. Specifically, we introduce a label-guided adjacent-aware recovery module that uses a self-attention mechanism to exploit neighbor semantics and generate distribution-consistent features for high-quality modality reconstruction. The recovered features are then refined through a knowledge distillation pathway into a modality generator, enhancing generalization under severe data incompleteness. For multi-modal representation learning, the recovered modality information is fused with the original incomplete information to enhance feature extraction and representation. Extensive experiments demonstrate the effectiveness of our method in diagnosing epilepsy and Alzheimer's disease.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"87 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}