Pub Date : 2025-06-17DOI: 10.1109/TMI.2025.3580611
Zhixuan Zhou;Tingting Dan;Guorong Wu
One of the fundamental scientific problems in neuroscience is to have a good understanding of how cognition and behavior emerge from brain function. Since the neuroscience concept of cognitive control parallels the notion of system control in engineering, many computational models formulate the dynamics neural process into a dynamical system, where the hidden states of the complex neural system are modulated by energetic simulations. However, the human brain is a quintessential complex biological system. Current computation models either use neural networks to approximate the underlying dynamics, which makes it difficult to fully understand the system mechanics, or compromise to simplified linear models with very limited power to characterize non-linear and self-organized dynamics along with complex neural activities. To address this challenge, we devise an end-to-end deep model to identify the underlying brain dynamics based on Koopman operator theory, which allows us to model a complex non-linear system in an infinite-dimensional linear space. In the context of reverse engineering, we further propose a biology-inspired control module that adjusts the input (neural activity data) based on feedback to align brain dynamics with the underlying cognitive task. We have applied our deep model to predict cognitive states from a large scale of existing neuroimaging data by identifying the latent dynamic system of functional fluctuations. Promising results demonstrate the potential of establishing a system-level understanding of the intricate relationship between brain function and cognition through the landscape of explainable deep models.
{"title":"Understanding Brain Functional Dynamics Through Neural Koopman Operator With Control Mechanism","authors":"Zhixuan Zhou;Tingting Dan;Guorong Wu","doi":"10.1109/TMI.2025.3580611","DOIUrl":"10.1109/TMI.2025.3580611","url":null,"abstract":"One of the fundamental scientific problems in neuroscience is to have a good understanding of how cognition and behavior emerge from brain function. Since the neuroscience concept of cognitive control parallels the notion of system control in engineering, many computational models formulate the dynamics neural process into a dynamical system, where the hidden states of the complex neural system are modulated by energetic simulations. However, the human brain is a quintessential complex biological system. Current computation models either use neural networks to approximate the underlying dynamics, which makes it difficult to fully understand the system mechanics, or compromise to simplified linear models with very limited power to characterize non-linear and self-organized dynamics along with complex neural activities. To address this challenge, we devise an end-to-end deep model to identify the underlying brain dynamics based on Koopman operator theory, which allows us to model a complex non-linear system in an infinite-dimensional linear space. In the context of reverse engineering, we further propose a biology-inspired control module that adjusts the input (neural activity data) based on feedback to align brain dynamics with the underlying cognitive task. We have applied our deep model to predict cognitive states from a large scale of existing neuroimaging data by identifying the latent dynamic system of functional fluctuations. Promising results demonstrate the potential of establishing a system-level understanding of the intricate relationship between brain function and cognition through the landscape of explainable deep models.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4627-4638"},"PeriodicalIF":0.0,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-16DOI: 10.1109/TMI.2025.3580082
Kaide Huang;Xiang-Lei Yuan;Rui-De Liu;Lian-Song Ye;Yao Zhou;Bing Hu;Zhang Yi
Automatic recognition of surgical workflow plays a vital role in modern operating rooms. Given the complex nature and extended duration of surgical videos, accurate recognition of surgical workflow is highly challenging. Despite being widely studied, existing methods still face two major limitations: insufficient visual feature extraction and performance degradation caused by inconsistency between training and testing features. To address these limitations, this paper proposes a Multi-Teacher Temporal Regulation Network (MTTR-Net) for surgical workflow recognition. To extract discriminative visual features, we introduce a “sequence of clips” training strategy. This strategy employs a set of sparsely sampled video clips as input to train the feature encoder and incorporates an auxiliary temporal regularizer to model long-range temporal dependencies across these clips, ensuring the feature encoder captures critical information from each frame. Then, to mitigate the inconsistency between training and testing features, we further develop a cross-mimicking strategy that iteratively trains multiple feature encoders on different data subsets to generate consistent mimicked features. A temporal encoder is trained on these mimicked features to achieve stable performance during testing. Extensive experiments on eight public surgical video datasets demonstrate that our MTTR-Net outperforms state-of-the-art methods across various metrics. Our code has been released at https://github.com/kaideH/MGTR-Net
{"title":"Multi-Teacher Temporal Regulation Network for Surgical Workflow Recognition","authors":"Kaide Huang;Xiang-Lei Yuan;Rui-De Liu;Lian-Song Ye;Yao Zhou;Bing Hu;Zhang Yi","doi":"10.1109/TMI.2025.3580082","DOIUrl":"10.1109/TMI.2025.3580082","url":null,"abstract":"Automatic recognition of surgical workflow plays a vital role in modern operating rooms. Given the complex nature and extended duration of surgical videos, accurate recognition of surgical workflow is highly challenging. Despite being widely studied, existing methods still face two major limitations: insufficient visual feature extraction and performance degradation caused by inconsistency between training and testing features. To address these limitations, this paper proposes a Multi-Teacher Temporal Regulation Network (MTTR-Net) for surgical workflow recognition. To extract discriminative visual features, we introduce a “sequence of clips” training strategy. This strategy employs a set of sparsely sampled video clips as input to train the feature encoder and incorporates an auxiliary temporal regularizer to model long-range temporal dependencies across these clips, ensuring the feature encoder captures critical information from each frame. Then, to mitigate the inconsistency between training and testing features, we further develop a cross-mimicking strategy that iteratively trains multiple feature encoders on different data subsets to generate consistent mimicked features. A temporal encoder is trained on these mimicked features to achieve stable performance during testing. Extensive experiments on eight public surgical video datasets demonstrate that our MTTR-Net outperforms state-of-the-art methods across various metrics. Our code has been released at <uri>https://github.com/kaideH/MGTR-Net</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4690-4703"},"PeriodicalIF":0.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144304753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view (FoV) and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconstruction, faces significant challenges in accurate motion estimation without relying on external positional sensors. MoGLo-Net addresses these limitations through an innovative adaptation of the self-attention mechanism, which effectively exploits the critical regions, such as fully-developed speckle areas or high-echogenic tissue regions within successive ultrasound images to accurately estimate the motion parameters. This facilitates the extraction of intricate features from individual frames. Additionally, we employ a patch-wise correlation operation to generate a correlation volume that is highly correlated with the scanning motion. A custom loss function was also developed to ensure robust learning with minimized bias, leveraging the characteristics of the motion parameters. Experimental evaluations demonstrated that MoGLo-Net surpasses current state-of-the-art methods in both quantitative and qualitative performance metrics. Furthermore, we expanded the application of 3D reconstruction technology beyond simple B-mode ultrasound volumes to incorporate Doppler ultrasound and photoacoustic imaging, enabling 3D visualization of vasculature. The source code for this study is publicly available at: https://github.com/pnu-amilab/US3D
{"title":"Enhancing Free-Hand 3-D Photoacoustic and Ultrasound Reconstruction Using Deep Learning","authors":"SiYeoul Lee;Seonho Kim;MinKyung Seo;SeongKyu Park;Salehin Imrus;Kambaluru Ashok;DongEon Lee;Chunsu Park;SeonYeong Lee;Jiye Kim;Jae-Heung Yoo;MinWoo Kim","doi":"10.1109/TMI.2025.3579454","DOIUrl":"10.1109/TMI.2025.3579454","url":null,"abstract":"This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view (FoV) and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconstruction, faces significant challenges in accurate motion estimation without relying on external positional sensors. MoGLo-Net addresses these limitations through an innovative adaptation of the self-attention mechanism, which effectively exploits the critical regions, such as fully-developed speckle areas or high-echogenic tissue regions within successive ultrasound images to accurately estimate the motion parameters. This facilitates the extraction of intricate features from individual frames. Additionally, we employ a patch-wise correlation operation to generate a correlation volume that is highly correlated with the scanning motion. A custom loss function was also developed to ensure robust learning with minimized bias, leveraging the characteristics of the motion parameters. Experimental evaluations demonstrated that MoGLo-Net surpasses current state-of-the-art methods in both quantitative and qualitative performance metrics. Furthermore, we expanded the application of 3D reconstruction technology beyond simple B-mode ultrasound volumes to incorporate Doppler ultrasound and photoacoustic imaging, enabling 3D visualization of vasculature. The source code for this study is publicly available at: <uri>https://github.com/pnu-amilab/US3D</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4652-4665"},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11036110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We developed an automated photoacoustic and ultrasound breast tomography system that images the patient in the standing pose. The system, named OneTouch-PAT, utilized linear transducer arrays with optical-acoustic combiners for effective dual-modal imaging. During scanning, subjects only need to gently attach their breasts to the imaging window, and co-registered three-dimensional ultrasonic and photoacoustic images of the breast can be obtained within one minute. Our system has a large field of view of 17 cm by 15 cm and achieves an imaging depth of 3 cm with sub-millimeter resolution. A three-dimensional deep-learning network was also developed to further improve the image quality by improving the 3D resolution, enhancing vasculature, eliminating skin signals, and reducing noise. The performance of the system was tested on four healthy subjects and 61 patients with breast cancer. Our results indicate that the ultrasound structural information can be combined with the photoacoustic vascular information for better tissue characterization. Representative cases from different molecular subtypes have indicated different photoacoustic and ultrasound features that could potentially be used for imaging-based cancer classification. Statistical analysis among all patients indicates that the regional photoacoustic intensity and vessel branching points are indicators of breast malignancy. These promising results suggest that our system could significantly enhance breast cancer diagnosis and classification.
{"title":"OneTouch Automated Photoacoustic and Ultrasound Imaging of Breast in Standing Pose","authors":"Huijuan Zhang;Emily Zheng;Wenhan Zheng;Chuqin Huang;Yunqi Xi;Yanda Cheng;Shuliang Yu;Saptarshi Chakraborty;Ermelinda Bonaccio;Kazuaki Takabe;Xinhao C. Fan;Wenyao Xu;Jun Xia","doi":"10.1109/TMI.2025.3578929","DOIUrl":"10.1109/TMI.2025.3578929","url":null,"abstract":"We developed an automated photoacoustic and ultrasound breast tomography system that images the patient in the standing pose. The system, named OneTouch-PAT, utilized linear transducer arrays with optical-acoustic combiners for effective dual-modal imaging. During scanning, subjects only need to gently attach their breasts to the imaging window, and co-registered three-dimensional ultrasonic and photoacoustic images of the breast can be obtained within one minute. Our system has a large field of view of 17 cm by 15 cm and achieves an imaging depth of 3 cm with sub-millimeter resolution. A three-dimensional deep-learning network was also developed to further improve the image quality by improving the 3D resolution, enhancing vasculature, eliminating skin signals, and reducing noise. The performance of the system was tested on four healthy subjects and 61 patients with breast cancer. Our results indicate that the ultrasound structural information can be combined with the photoacoustic vascular information for better tissue characterization. Representative cases from different molecular subtypes have indicated different photoacoustic and ultrasound features that could potentially be used for imaging-based cancer classification. Statistical analysis among all patients indicates that the regional photoacoustic intensity and vessel branching points are indicators of breast malignancy. These promising results suggest that our system could significantly enhance breast cancer diagnosis and classification.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4617-4626"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-12DOI: 10.1109/TMI.2025.3578995
Yulin Fang;Minghui Wang;Qilong Song;Chi Cao;Ziyu Gao;Biao Song;Xuhong Min;Ao Li
Accurate and non-invasive prediction of epidermal growth factor receptor (EGFR) mutation is crucial for the diagnosis and treatment of non-small cell lung cancer (NSCLC). While computed tomography (CT) imaging shows promise in identifying EGFR mutation, current prediction methods heavily rely on fully supervised learning, which overlooks the substantial heterogeneity of tumors and therefore leads to suboptimal results. To tackle tumor heterogeneity issue, this study introduces a novel weakly supervised method named TransMIEL, which leverages multiple instance learning techniques for accurate EGFR mutation prediction. Specifically, we first propose an innovative instance enhancement learning (IEL) strategy that strengthens the discriminative power of instance features for complex tumor CT images by exploring self-derived soft pseudo-labels. Next, to improve tumor representation capability, we design a spatial-aware transformer (SAT) that fully captures inter-instance relationships of different pathological subregions to mirror the diagnostic processes of radiologists. Finally, an instance adaptive gating (IAG) module is developed to effectively emphasize the contribution of informative instance features in heterogeneous tumors, facilitating dynamic instance feature aggregation and increasing model generalization performance. Experimental results demonstrate that TransMIEL significantly outperforms existing fully and weakly supervised methods on both public and in-house NSCLC datasets. Additionally, visualization results show that our approach can highlight intra-tumor and peri-tumor areas relevant to EGFR mutation status. Therefore, our method holds significant potential as an effective tool for EGFR prediction and offers a novel perspective for future research on tumor heterogeneity.
{"title":"Tackling Tumor Heterogeneity Issue: Transformer-Based Multiple Instance Enhancement Learning for Predicting EGFR Mutation via CT Images","authors":"Yulin Fang;Minghui Wang;Qilong Song;Chi Cao;Ziyu Gao;Biao Song;Xuhong Min;Ao Li","doi":"10.1109/TMI.2025.3578995","DOIUrl":"10.1109/TMI.2025.3578995","url":null,"abstract":"Accurate and non-invasive prediction of epidermal growth factor receptor (EGFR) mutation is crucial for the diagnosis and treatment of non-small cell lung cancer (NSCLC). While computed tomography (CT) imaging shows promise in identifying EGFR mutation, current prediction methods heavily rely on fully supervised learning, which overlooks the substantial heterogeneity of tumors and therefore leads to suboptimal results. To tackle tumor heterogeneity issue, this study introduces a novel weakly supervised method named TransMIEL, which leverages multiple instance learning techniques for accurate EGFR mutation prediction. Specifically, we first propose an innovative instance enhancement learning (IEL) strategy that strengthens the discriminative power of instance features for complex tumor CT images by exploring self-derived soft pseudo-labels. Next, to improve tumor representation capability, we design a spatial-aware transformer (SAT) that fully captures inter-instance relationships of different pathological subregions to mirror the diagnostic processes of radiologists. Finally, an instance adaptive gating (IAG) module is developed to effectively emphasize the contribution of informative instance features in heterogeneous tumors, facilitating dynamic instance feature aggregation and increasing model generalization performance. Experimental results demonstrate that TransMIEL significantly outperforms existing fully and weakly supervised methods on both public and in-house NSCLC datasets. Additionally, visualization results show that our approach can highlight intra-tumor and peri-tumor areas relevant to EGFR mutation status. Therefore, our method holds significant potential as an effective tool for EGFR prediction and offers a novel perspective for future research on tumor heterogeneity.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4524-4535"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient consideration of MRI-specific tumor features such as complex textures and directional variations. To address this, we propose the Harmonized Frequency Fusion Network (HFF-Net), which rethinks brain tumor segmentation from a frequency-domain perspective. To comprehensively characterize tumor regions, we develop a Frequency Domain Decomposition (FDD) module that separates MRI images into low-frequency components, capturing smooth tumor contours and high-frequency components, highlighting detailed textures and directional edges. To further enhance sensitivity to tumor boundaries, we introduce an Adaptive Laplacian Convolution (ALC) module that adaptively emphasizes critical high-frequency details using dynamically updated convolution kernels. To effectively fuse tumor features across multiple scales, we design a Frequency Domain Cross-Attention (FDCA) integrating semantic, positional, and slice-specific information. We further validate and interpret frequency-domain improvements through visualization, theoretical reasoning, and experimental analyses. Extensive experiments on four public datasets demonstrate that HFF-Net achieves an average relative improvement of 4.48% (ranging from 2.39% to 7.72%) in the mean Dice scores across the three major subregions, and an average relative improvement of 7.33% (ranging from 5.96% to 8.64%) in the segmentation of contrast-enhancing tumor regions, while maintaining favorable computational efficiency and clinical applicability. Our code is available at: https://github.com/VinyehShaw/HFF
{"title":"Rethinking Brain Tumor Segmentation From the Frequency Domain Perspective","authors":"Minye Shao;Zeyu Wang;Haoran Duan;Yawen Huang;Bing Zhai;Shizheng Wang;Yang Long;Yefeng Zheng","doi":"10.1109/TMI.2025.3579213","DOIUrl":"10.1109/TMI.2025.3579213","url":null,"abstract":"Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient consideration of MRI-specific tumor features such as complex textures and directional variations. To address this, we propose the Harmonized Frequency Fusion Network (HFF-Net), which rethinks brain tumor segmentation from a frequency-domain perspective. To comprehensively characterize tumor regions, we develop a Frequency Domain Decomposition (FDD) module that separates MRI images into low-frequency components, capturing smooth tumor contours and high-frequency components, highlighting detailed textures and directional edges. To further enhance sensitivity to tumor boundaries, we introduce an Adaptive Laplacian Convolution (ALC) module that adaptively emphasizes critical high-frequency details using dynamically updated convolution kernels. To effectively fuse tumor features across multiple scales, we design a Frequency Domain Cross-Attention (FDCA) integrating semantic, positional, and slice-specific information. We further validate and interpret frequency-domain improvements through visualization, theoretical reasoning, and experimental analyses. Extensive experiments on four public datasets demonstrate that HFF-Net achieves an average relative improvement of 4.48% (ranging from 2.39% to 7.72%) in the mean Dice scores across the three major subregions, and an average relative improvement of 7.33% (ranging from 5.96% to 8.64%) in the segmentation of contrast-enhancing tumor regions, while maintaining favorable computational efficiency and clinical applicability. Our code is available at: <uri>https://github.com/VinyehShaw/HFF</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4536-4553"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-11DOI: 10.1109/TMI.2025.3578601
Meiling Wang;Liang Sun;Wei Shao;Daoqiang Zhang
Brain imaging genetics is a widely focused topic, which has achieved the great successes in the diagnosis of complex brain disorders. In clinical practice, most existing data fusion approaches extract features from homogeneous data, neglecting the heterogeneous structural information among imaging genetic data. In addition, the number of labeled samples is limited due to the cost and time of manually labeling data. To remedy such deficiencies, in this work, we present a multimodal fusion-based hypergraph transductive learning (MFHT) for clinical diagnosis. Specifically, for each modality, we first construct a corresponding similarity graph to reflect the similarity between subjects using the label prior. Then, the multiple graph fusion approach based on theoretical convergence guarantee is designed for learning a unified graph harnessing the structure of entire data. Finally, to fully exploit the rich information of the obtained graph, a hypergraph transductive learning approach is designed to effectively capture the complex structures and high-order relationships in both labeled and unlabeled data to achieve the diagnosis results. The brain imaging genetic data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets are used to experimentally explore our developed method. Related results show that our method is well applied to the analysis of brain imaging genetic data, which accounts for genetics, brain imaging (region of interest (ROI) node features), and brain imaging (connectivity edge features) to boost the understanding of disease mechanism as well as improve clinical diagnosis.
{"title":"Discovering Differential Imaging Genetic Modules via Multimodal Fusion-Based Hypergraph Transductive Learning in Alzheimer’s Disease Diagnosis","authors":"Meiling Wang;Liang Sun;Wei Shao;Daoqiang Zhang","doi":"10.1109/TMI.2025.3578601","DOIUrl":"10.1109/TMI.2025.3578601","url":null,"abstract":"Brain imaging genetics is a widely focused topic, which has achieved the great successes in the diagnosis of complex brain disorders. In clinical practice, most existing data fusion approaches extract features from homogeneous data, neglecting the heterogeneous structural information among imaging genetic data. In addition, the number of labeled samples is limited due to the cost and time of manually labeling data. To remedy such deficiencies, in this work, we present a multimodal fusion-based hypergraph transductive learning (MFHT) for clinical diagnosis. Specifically, for each modality, we first construct a corresponding similarity graph to reflect the similarity between subjects using the label prior. Then, the multiple graph fusion approach based on theoretical convergence guarantee is designed for learning a unified graph harnessing the structure of entire data. Finally, to fully exploit the rich information of the obtained graph, a hypergraph transductive learning approach is designed to effectively capture the complex structures and high-order relationships in both labeled and unlabeled data to achieve the diagnosis results. The brain imaging genetic data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets are used to experimentally explore our developed method. Related results show that our method is well applied to the analysis of brain imaging genetic data, which accounts for genetics, brain imaging (region of interest (ROI) node features), and brain imaging (connectivity edge features) to boost the understanding of disease mechanism as well as improve clinical diagnosis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4592-4604"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10DOI: 10.1109/TMI.2025.3578492
Chong Yin;Siqi Liu;Kaiyang Zhou;Vincent Wai-Sun Wong;Pong C. Yuen
The sharp rise in non-alcoholic fatty liver disease (NAFLD) cases has become a major health concern in recent years. Accurately identifying tissue alteration regions is crucial for NAFLD diagnosis but challenging with small-scale pathology datasets. Recently, prompt tuning has emerged as an effective strategy for adapting vision models to small-scale data analysis. However, current prompting techniques, designed primarily for general image classification, use generic cues that are inadequate when dealing with the intricacies of pathological tissue analysis. To solve this problem, we introduce Quantitative Attribute-based Polarity Visual Prompting (Q-PoVP), a new prompting method for pathology image analysis. Q-PoVP introduces two types of measurable attributes: K-function-based spatial attributes and histogram-based morphological attributes. Both help to measure tissue conditions quantitatively. We develop a quantitative attribute-based polarity visual prompt generator that converts quantitative visual attributes into positive and negative visual prompts, facilitating a more comprehensive and nuanced interpretation of pathological images. To enhance feature discrimination, we introduce a novel orthogonal-based polarity visual prompt tuning technique that disentangles and amplifies positive visual attributes while suppressing negative ones. We extensively tested our method on three different tasks. Our task-specific prompting demonstrates superior performance in both diagnostic accuracy and interpretability compared to existing methods. This dual advantage makes it particularly valuable for clinical settings, where healthcare providers require not only reliable results but also transparent reasoning to support informed patient care decisions. Code is available at https://github.com/7LFB/Q-PoVP
{"title":"Polarity Prompting Vision Foundation Models for Pathology Image Analysis","authors":"Chong Yin;Siqi Liu;Kaiyang Zhou;Vincent Wai-Sun Wong;Pong C. Yuen","doi":"10.1109/TMI.2025.3578492","DOIUrl":"10.1109/TMI.2025.3578492","url":null,"abstract":"The sharp rise in non-alcoholic fatty liver disease (NAFLD) cases has become a major health concern in recent years. Accurately identifying tissue alteration regions is crucial for NAFLD diagnosis but challenging with small-scale pathology datasets. Recently, prompt tuning has emerged as an effective strategy for adapting vision models to small-scale data analysis. However, current prompting techniques, designed primarily for general image classification, use generic cues that are inadequate when dealing with the intricacies of pathological tissue analysis. To solve this problem, we introduce Quantitative Attribute-based Polarity Visual Prompting (Q-PoVP), a new prompting method for pathology image analysis. Q-PoVP introduces two types of measurable attributes: K-function-based spatial attributes and histogram-based morphological attributes. Both help to measure tissue conditions quantitatively. We develop a quantitative attribute-based polarity visual prompt generator that converts quantitative visual attributes into positive and negative visual prompts, facilitating a more comprehensive and nuanced interpretation of pathological images. To enhance feature discrimination, we introduce a novel orthogonal-based polarity visual prompt tuning technique that disentangles and amplifies positive visual attributes while suppressing negative ones. We extensively tested our method on three different tasks. Our task-specific prompting demonstrates superior performance in both diagnostic accuracy and interpretability compared to existing methods. This dual advantage makes it particularly valuable for clinical settings, where healthcare providers require not only reliable results but also transparent reasoning to support informed patient care decisions. Code is available at <uri>https://github.com/7LFB/Q-PoVP</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4579-4591"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article proposes an ex-vivo method to estimate the dielectric properties and thickness of adipose tissue in the human body. Based on the electrical properties of adipose tissue, obesity levels will be assessed. This approach consists of two steps: 1) data acquisition by an ultrawideband (UWB) time-domain radar and 2) genetic algorithm optimization of the intended goal function. This study considers a three-layered tissue model to mimic the surface of the human abdomen. The experimental phantom consists of a pork skin layer followed by pork fat, then ground pork to emulate the muscle tissue. An aperture with a diameter of 2 cm on a metal sheet focuses the measurements on a small area of interest. The measured results were compared with the actual permittivity and thickness of different layers of the experimental phantom. The technique is also applied to human voxel tissue models available in the CST software library, including babies, children, and adults. The accuracy of measurement data confirms the suitability of this technique. This technique is a noninvasive, safe, cost-effective method to determine the type of fat tissue in the human body and the level of obesity.
{"title":"Pediatric Corpulence Assessment Using Ultra-Wideband Radar Imaging System: A Novel Approach in Tissue Characterization","authors":"Kapil Gangwar;Fatemeh Modares Sabzevari;Karumudi Rambabu","doi":"10.1109/TMI.2025.3578283","DOIUrl":"10.1109/TMI.2025.3578283","url":null,"abstract":"This article proposes an ex-vivo method to estimate the dielectric properties and thickness of adipose tissue in the human body. Based on the electrical properties of adipose tissue, obesity levels will be assessed. This approach consists of two steps: 1) data acquisition by an ultrawideband (UWB) time-domain radar and 2) genetic algorithm optimization of the intended goal function. This study considers a three-layered tissue model to mimic the surface of the human abdomen. The experimental phantom consists of a pork skin layer followed by pork fat, then ground pork to emulate the muscle tissue. An aperture with a diameter of 2 cm on a metal sheet focuses the measurements on a small area of interest. The measured results were compared with the actual permittivity and thickness of different layers of the experimental phantom. The technique is also applied to human voxel tissue models available in the CST software library, including babies, children, and adults. The accuracy of measurement data confirms the suitability of this technique. This technique is a noninvasive, safe, cost-effective method to determine the type of fat tissue in the human body and the level of obesity.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4554-4566"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic and accurate classification of cholangiocarcinoma (CCA) using optical coherence tomography (OCT) images is critical for confirming infiltration margins. Considering that the morphological representations in pathology stains can be implicitly captured in OCT imaging, we introduce the optical attenuation coefficient (OAC) and generalized visual-language information to focus on the optical properties of diseased tissue and exploit its inherent textured features. Maintaining the data within the appropriate working range during OCT scanning is crucial for reliable diagnosis. To this end, we propose an autonomous scanning method integrated with novel deep learning architecture to construct an efficient computer-aided system. We develop a cross-modal complementarity model, the language and attenuation-driven network (LA-OCT Net), designed to enhance the interaction between OAC and OCT information and leverage generalized image-text alignment for refined feature representation. The model incorporates a disentangled attenuation selection-based adversarial correlation loss to magnify the discrepancy between cross-modal features while maintaining discriminative consistency. The proposed robot-assisted pipeline ensures precise repositioning of the diseased cross-sectional location, allowing consistent measurements to treatment and precise tumor margin detection. Extensive experiments on a comprehensive clinical dataset demonstrate the effectiveness and superiority of our method. Specifically, our approach not only improves accuracy by 6% compared to state-of-the-art techniques, while also providing new insights into the potential of optical biopsy.
{"title":"Language and Attenuation-Driven Network for Robot-Assisted Cholangiocarcinoma Diagnosis From Optical Coherence Tomography","authors":"Chuanhao Zhang;Yangxi Li;Jianping Song;Yuxuan Zhai;Yuchao Zheng;Yingwei Fan;Canhong Xiang;Fang Chen;Hongen Liao","doi":"10.1109/TMI.2025.3578179","DOIUrl":"10.1109/TMI.2025.3578179","url":null,"abstract":"Automatic and accurate classification of cholangiocarcinoma (CCA) using optical coherence tomography (OCT) images is critical for confirming infiltration margins. Considering that the morphological representations in pathology stains can be implicitly captured in OCT imaging, we introduce the optical attenuation coefficient (OAC) and generalized visual-language information to focus on the optical properties of diseased tissue and exploit its inherent textured features. Maintaining the data within the appropriate working range during OCT scanning is crucial for reliable diagnosis. To this end, we propose an autonomous scanning method integrated with novel deep learning architecture to construct an efficient computer-aided system. We develop a cross-modal complementarity model, the language and attenuation-driven network (LA-OCT Net), designed to enhance the interaction between OAC and OCT information and leverage generalized image-text alignment for refined feature representation. The model incorporates a disentangled attenuation selection-based adversarial correlation loss to magnify the discrepancy between cross-modal features while maintaining discriminative consistency. The proposed robot-assisted pipeline ensures precise repositioning of the diseased cross-sectional location, allowing consistent measurements to treatment and precise tumor margin detection. Extensive experiments on a comprehensive clinical dataset demonstrate the effectiveness and superiority of our method. Specifically, our approach not only improves accuracy by 6% compared to state-of-the-art techniques, while also providing new insights into the potential of optical biopsy.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4511-4523"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144251989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}