Pub Date : 2026-02-12DOI: 10.1109/JBHI.2026.3664127
Yuxin Tian, Muhan Shi, Jianxun Li, Bin Zhang, Min Qu, Yinxue Shi, Xian Yang, Min Wang
Semi-supervised learning reduces annotation costs in medical image segmentation by leveraging abundant unlabeled data alongside scarce labels. Most models adopt an encoder-decoder architecture with a task-specific segmentation head. While co-training is effective, existing frameworks suffer from intra-network coupling (decoder-head binding) and inter-network coupling (over-aligned predictions), which reduce prediction diversity and amplify confirmation bias-particularly for small structures, ambiguous boundaries, and anatomically variable regions. We propose AsyCo, an asymmetric co-training framework with two components. (1) Asymmetric Decoder Coupling implements decoder-head decoupling by dynamically remapping encoder-decoder features to non-default heads across branches, breaking intra-network coupling and creating diverse prediction paths without additional parameters. (2) Hierarchical Consistency Regularization converts this diversity into stable supervision by aligning (i) the two branches' final outputs along their default paths (branch-output consistency), (ii) predictions from different segmentation heads evaluated on identical decoder features (inter-head consistency), and (iii) intermediate encoder-decoder representations (representation consistency). Through these mechanisms, AsyCo explicitly mitigates both intra- and inter-network coupling, improving training stability and reducing confirmation bias. Extensive experiments on three clinical benchmarks under limited-label regimes demonstrate that AsyCo consistently outperforms nine state-of-the-art semi-supervised learning methods. These results indicate that AsyCo delivers accurate and reliable segmentation with minimal annotation, thereby enhancing the reliability of medical image analysis in real-world clinical practice.
{"title":"Asymmetric Co-Training With Decoder-Head Decoupling for Semi-Supervised Medical Image Segmentation.","authors":"Yuxin Tian, Muhan Shi, Jianxun Li, Bin Zhang, Min Qu, Yinxue Shi, Xian Yang, Min Wang","doi":"10.1109/JBHI.2026.3664127","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3664127","url":null,"abstract":"<p><p>Semi-supervised learning reduces annotation costs in medical image segmentation by leveraging abundant unlabeled data alongside scarce labels. Most models adopt an encoder-decoder architecture with a task-specific segmentation head. While co-training is effective, existing frameworks suffer from intra-network coupling (decoder-head binding) and inter-network coupling (over-aligned predictions), which reduce prediction diversity and amplify confirmation bias-particularly for small structures, ambiguous boundaries, and anatomically variable regions. We propose AsyCo, an asymmetric co-training framework with two components. (1) Asymmetric Decoder Coupling implements decoder-head decoupling by dynamically remapping encoder-decoder features to non-default heads across branches, breaking intra-network coupling and creating diverse prediction paths without additional parameters. (2) Hierarchical Consistency Regularization converts this diversity into stable supervision by aligning (i) the two branches' final outputs along their default paths (branch-output consistency), (ii) predictions from different segmentation heads evaluated on identical decoder features (inter-head consistency), and (iii) intermediate encoder-decoder representations (representation consistency). Through these mechanisms, AsyCo explicitly mitigates both intra- and inter-network coupling, improving training stability and reducing confirmation bias. Extensive experiments on three clinical benchmarks under limited-label regimes demonstrate that AsyCo consistently outperforms nine state-of-the-art semi-supervised learning methods. These results indicate that AsyCo delivers accurate and reliable segmentation with minimal annotation, thereby enhancing the reliability of medical image analysis in real-world clinical practice.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146179312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Methamphetamine dependence poses a significant global health challenge, yet its assessment and the evaluation of treatments like repetitive transcranial magnetic stimulation (rTMS) frequently depend on subjective self-reports, which may introduce uncertainties. While objective neuroimaging modalities such as electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) offer alternatives, their individual limitations and the reliance on conventional, often hand-crafted, feature extraction can compromise the reliability of derived biomarkers. To overcome these limitations, we propose NeuroCLIP, a novel deep learning framework integrating simultaneously recorded EEG and fNIRS data through a progressive learning strategy. This approach offers a robust and trustworthy data-driven biomarker for methamphetamine addiction. Validation experiments show that NeuroCLIP significantly improves discriminative capabilities among the methamphetamine-dependent individuals and healthy controls compared to models using either EEG or only fNIRS alone. Furthermore, the proposed framework facilitates objective, brain-based evaluation of rTMS treatment efficacy, demonstrating measurable shifts in neural patterns towards healthy control profiles after treatment. Critically, we establish the trustworthiness of the multimodal data-driven biomarker by showing its strong correlation with psychometrically validated craving scores. These findings suggest that biomarker derived from EEG-fNIRS data via NeuroCLIP offers enhanced robustness and reliability over single-modality approaches, providing a valuable tool for addiction neuroscience research and potentially improving clinical assessments.
{"title":"NeuroCLIP: A Multimodal Contrastive Learning Method for rTMS-treated Methamphetamine Addiction Analysis.","authors":"Chengkai Wang, Di Wu, Yunsheng Liao, Wenyao Zheng, Ziyi Zeng, Xurong Gao, Hemmings Wu, Zhoule Zhu, Jie Yang, Lihua Zhong, Weiwei Cheng, Yun-Hsuan Chen, Mohamad Sawan","doi":"10.1109/JBHI.2026.3663869","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663869","url":null,"abstract":"<p><p>Methamphetamine dependence poses a significant global health challenge, yet its assessment and the evaluation of treatments like repetitive transcranial magnetic stimulation (rTMS) frequently depend on subjective self-reports, which may introduce uncertainties. While objective neuroimaging modalities such as electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) offer alternatives, their individual limitations and the reliance on conventional, often hand-crafted, feature extraction can compromise the reliability of derived biomarkers. To overcome these limitations, we propose NeuroCLIP, a novel deep learning framework integrating simultaneously recorded EEG and fNIRS data through a progressive learning strategy. This approach offers a robust and trustworthy data-driven biomarker for methamphetamine addiction. Validation experiments show that NeuroCLIP significantly improves discriminative capabilities among the methamphetamine-dependent individuals and healthy controls compared to models using either EEG or only fNIRS alone. Furthermore, the proposed framework facilitates objective, brain-based evaluation of rTMS treatment efficacy, demonstrating measurable shifts in neural patterns towards healthy control profiles after treatment. Critically, we establish the trustworthiness of the multimodal data-driven biomarker by showing its strong correlation with psychometrically validated craving scores. These findings suggest that biomarker derived from EEG-fNIRS data via NeuroCLIP offers enhanced robustness and reliability over single-modality approaches, providing a valuable tool for addiction neuroscience research and potentially improving clinical assessments.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Multiple limitations exist in current automated ECG analysis, including insufficient feature integration across leads, limited interpretability, poor generalization, and inadequate handling of class imbalance. To address these challenges, we develop a novel dual-branch framework that comprehensively captures spatial-temporal features for cardiac disease diagnosis.
Methods: ECG-AuxNet combines a Multi-scale Transformer Attention CNN for spatial feature extraction and a GRU network for temporal dependency modeling. A Dual-stage Cross-Attention Fusion module integrates features from both branches, while a Feature Space Reconstruction (FSR) auxiliary task is introduced as a manifold regularizer to enhance feature discrimination. The framework was evaluated on PTB-XL (15,709 ECGs) and validated in real-world clinical scenarios (SXMU-2k, 1,673 ECGs).
Results: For class-imbalanced disease recognition (NORM, CD, MI, STTC), ECG-AuxNet attained 78.34% F1-score on PTB-XL and 82.63% F1-score on SXMU-2k, outperforming 9 baseline models. FSR significantly improved feature discrimination by 11.7%, enhancing class boundary clarity and classification accuracy. Grad-CAM analysis revealed attention patterns that precisely match cardiologists' diagnostic focus areas.
Conclusion: ECG-AuxNet effectively integrates spatial-temporal features through auxiliary learning, achieving robust generalizability in cardiac disease diagnosis with interpretability aligned with clinical expertise.
{"title":"ECG-AuxNet: A Dual-Branch Spatial-Temporal Feature Fusion Framework with Auxiliary Learning for Enhanced Cardiac Disease Diagnosis.","authors":"Ruiqi Shen, Yanan Wang, Chunge Cao, Shuaicong Hu, Jia Liu, Hongyu Wang, Gaoyan Zhong, Cuiwei Yang","doi":"10.1109/JBHI.2026.3664231","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3664231","url":null,"abstract":"<p><strong>Objective: </strong>Multiple limitations exist in current automated ECG analysis, including insufficient feature integration across leads, limited interpretability, poor generalization, and inadequate handling of class imbalance. To address these challenges, we develop a novel dual-branch framework that comprehensively captures spatial-temporal features for cardiac disease diagnosis.</p><p><strong>Methods: </strong>ECG-AuxNet combines a Multi-scale Transformer Attention CNN for spatial feature extraction and a GRU network for temporal dependency modeling. A Dual-stage Cross-Attention Fusion module integrates features from both branches, while a Feature Space Reconstruction (FSR) auxiliary task is introduced as a manifold regularizer to enhance feature discrimination. The framework was evaluated on PTB-XL (15,709 ECGs) and validated in real-world clinical scenarios (SXMU-2k, 1,673 ECGs).</p><p><strong>Results: </strong>For class-imbalanced disease recognition (NORM, CD, MI, STTC), ECG-AuxNet attained 78.34% F1-score on PTB-XL and 82.63% F1-score on SXMU-2k, outperforming 9 baseline models. FSR significantly improved feature discrimination by 11.7%, enhancing class boundary clarity and classification accuracy. Grad-CAM analysis revealed attention patterns that precisely match cardiologists' diagnostic focus areas.</p><p><strong>Conclusion: </strong>ECG-AuxNet effectively integrates spatial-temporal features through auxiliary learning, achieving robust generalizability in cardiac disease diagnosis with interpretability aligned with clinical expertise.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1109/JBHI.2026.3663876
Baiming Chen, Xin Gao, Weiguo Zhang, Sue Cao, Si Li, Linhai Yan
Cerebrovascular diseases (CVDs) such as aneurysms, arteriovenous malformations, stenosis, and Moyamoya disease are major public health concerns. Accurate classification of these conditions is essential for timely intervention, yet current computer-aided methods often exhibit limited representational capacity, feature redundancy, and insufficient interpretability, restricting clinical applicability. We propose PASAformer, a Swin-Transformer-based framework for cerebrovascular disease classification on Digital Subtraction Angiography (DSA). PASAformer incorporates a Pathology-Aware Sparse Attention (PASA) module that emphasizes lesion-related regions while suppressing background redundancy. Inserted into the Swin backbone, PASA replaces dense window self-attention, improving computational efficiency while preserving the hierarchical architecture. We further employ the MiAMix data augmenter to increase sample diversity, and incorporate a CombinedAdapter encoder that injects anatomical priors from the frozen Medical Segment Anything Model (MED-SAM) into early-stage representations, strengthening discriminative power under limited supervision. To support research in this underexplored area, we curate CDSA-NEO, a proprietary DSA dataset comprising more than 1,700 static images across four major cerebrovascular disease categories, constituting the first large-scale benchmark of its kind. Furthermore, an external cohort of angiographic runs with sequential, unselected frames is used to assess robustness in realistic temporal workflows. Extensive experiments on CDSA-NEO and public vascular datasets demonstrate that PASAformer achieves competitive precision and balanced accuracy compared to representative state-of-the-art models, while providing more focused visual explanations. These results suggest that PASAformer can support automated cerebrovascular disease classification on angiography, and that CDSA-NEO provides a benchmark for future method development and evaluation.
{"title":"PASAformer: Cerebrovascular Disease Classification with Medical Prior-Guided Adapter and Pathology-Aware Sparse Attention.","authors":"Baiming Chen, Xin Gao, Weiguo Zhang, Sue Cao, Si Li, Linhai Yan","doi":"10.1109/JBHI.2026.3663876","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663876","url":null,"abstract":"<p><p>Cerebrovascular diseases (CVDs) such as aneurysms, arteriovenous malformations, stenosis, and Moyamoya disease are major public health concerns. Accurate classification of these conditions is essential for timely intervention, yet current computer-aided methods often exhibit limited representational capacity, feature redundancy, and insufficient interpretability, restricting clinical applicability. We propose PASAformer, a Swin-Transformer-based framework for cerebrovascular disease classification on Digital Subtraction Angiography (DSA). PASAformer incorporates a Pathology-Aware Sparse Attention (PASA) module that emphasizes lesion-related regions while suppressing background redundancy. Inserted into the Swin backbone, PASA replaces dense window self-attention, improving computational efficiency while preserving the hierarchical architecture. We further employ the MiAMix data augmenter to increase sample diversity, and incorporate a CombinedAdapter encoder that injects anatomical priors from the frozen Medical Segment Anything Model (MED-SAM) into early-stage representations, strengthening discriminative power under limited supervision. To support research in this underexplored area, we curate CDSA-NEO, a proprietary DSA dataset comprising more than 1,700 static images across four major cerebrovascular disease categories, constituting the first large-scale benchmark of its kind. Furthermore, an external cohort of angiographic runs with sequential, unselected frames is used to assess robustness in realistic temporal workflows. Extensive experiments on CDSA-NEO and public vascular datasets demonstrate that PASAformer achieves competitive precision and balanced accuracy compared to representative state-of-the-art models, while providing more focused visual explanations. These results suggest that PASAformer can support automated cerebrovascular disease classification on angiography, and that CDSA-NEO provides a benchmark for future method development and evaluation.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1109/JBHI.2025.3626073
Diana Saker, Haya Salameh, Hila Gendler-Shalev, Hagit Hel-Or
Early identification of infants and toddlers at risk for developmental disorders can improve the efficiency of early intervention programs and can reduce healthcare costs. The MacArthur-Bates Communicative Development Inventory (MB-CDI) is a standardized tool for assessing children's early lexical development. However, due to its long list of words, administration is time-consuming and often limiting. In this paper we use Machine learning together with a computerized adaptive testing approach (ML-CAT), to shorten the MB-CDI by adapting the sequence of words to the subject's responses. We show that the ML-CAT can reliably predict the final score of the H-MB-CDI with as few as 10 words on average while maintaining 94% to 96% accuracy. We further show that the ML-CAT outperforms existing approaches, including fixed, non adaptive methods as well as statistical models based on Item Response Theory (IRT). Results are also given for five different languages. Most importantly, ML-CAT is shown to outperform IRT based methods when handling atypical talkers (outliers). The ML-CAT enables more efficient lexical development assessment, allowing for a wider and repeated screening in the community. Additionally, due to its shorter length, assessment is expected to be less of a burden on the subject or her caregiver and consequently more reliable.
{"title":"Shortening the MacArthur-Bates Communicative Developmental Inventory Using Machine Learning Based Computerized Adaptive Testing (ML-CAT).","authors":"Diana Saker, Haya Salameh, Hila Gendler-Shalev, Hagit Hel-Or","doi":"10.1109/JBHI.2025.3626073","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3626073","url":null,"abstract":"<p><p>Early identification of infants and toddlers at risk for developmental disorders can improve the efficiency of early intervention programs and can reduce healthcare costs. The MacArthur-Bates Communicative Development Inventory (MB-CDI) is a standardized tool for assessing children's early lexical development. However, due to its long list of words, administration is time-consuming and often limiting. In this paper we use Machine learning together with a computerized adaptive testing approach (ML-CAT), to shorten the MB-CDI by adapting the sequence of words to the subject's responses. We show that the ML-CAT can reliably predict the final score of the H-MB-CDI with as few as 10 words on average while maintaining 94% to 96% accuracy. We further show that the ML-CAT outperforms existing approaches, including fixed, non adaptive methods as well as statistical models based on Item Response Theory (IRT). Results are also given for five different languages. Most importantly, ML-CAT is shown to outperform IRT based methods when handling atypical talkers (outliers). The ML-CAT enables more efficient lexical development assessment, allowing for a wider and repeated screening in the community. Additionally, due to its shorter length, assessment is expected to be less of a burden on the subject or her caregiver and consequently more reliable.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1109/JBHI.2025.3633902
Jiacheng Hou, Zhenjie Song, Chenfei Ye, Ercan Engin Kuruoglu
Functional brain network (FBN) analysis aims to enhance the understanding of brain organization and support the diagnosis of neurological and psychiatric disorders. Prior studies have shown that FBNs exhibit small-world topology, where brain regions form functional clusters, and abnormalities in these clusters are strongly associated with disease. However, current learning-based methods either ignore this special topological structure or impose it as a post-hoc step outside the learning process, limiting both performance and interpretability. In this paper, we propose Learning Optimal Spectral Clustering (LOSC), a new framework that integrates the FBN generation, clustering, and classification with a novel graph theory grounded loss to fully exploit the small-world topology. Firstly, LOSC learns brain connectivity in a nonlinear spatio-spectral embedding space, guided by our proposed Rayleigh Quotient Loss (RQL), to preserve the small-world properties in generated FBNs. Then, the FBNs are partitioned into clusters of functionally synchronized regions, and both intra- and inter-cluster relations are utilized for brain network classification. Our contributions are threefold: (1) Improved brain network classification accuracy: by leveraging small-world functional clusters, LOSC achieves consistent gains of 2.0%, 3.6%, and 2.6% on the ABIDE, ADHD-200, and HCP datasets compared with state-of-the-art models, respectively; (2) Theoretical grounding: with our proposed RQL, LOSC bridges the gap between the graph theory and learning-based FBN analysis; and (3) Interpretability: the discovered functional clusters align with known neuropathology and contribute to the discovery of new functional community biomarkers.
{"title":"Learning Optimal Spectral Clustering for Functional Brain Network Generation and Classification.","authors":"Jiacheng Hou, Zhenjie Song, Chenfei Ye, Ercan Engin Kuruoglu","doi":"10.1109/JBHI.2025.3633902","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3633902","url":null,"abstract":"<p><p>Functional brain network (FBN) analysis aims to enhance the understanding of brain organization and support the diagnosis of neurological and psychiatric disorders. Prior studies have shown that FBNs exhibit small-world topology, where brain regions form functional clusters, and abnormalities in these clusters are strongly associated with disease. However, current learning-based methods either ignore this special topological structure or impose it as a post-hoc step outside the learning process, limiting both performance and interpretability. In this paper, we propose Learning Optimal Spectral Clustering (LOSC), a new framework that integrates the FBN generation, clustering, and classification with a novel graph theory grounded loss to fully exploit the small-world topology. Firstly, LOSC learns brain connectivity in a nonlinear spatio-spectral embedding space, guided by our proposed Rayleigh Quotient Loss (RQL), to preserve the small-world properties in generated FBNs. Then, the FBNs are partitioned into clusters of functionally synchronized regions, and both intra- and inter-cluster relations are utilized for brain network classification. Our contributions are threefold: (1) Improved brain network classification accuracy: by leveraging small-world functional clusters, LOSC achieves consistent gains of 2.0%, 3.6%, and 2.6% on the ABIDE, ADHD-200, and HCP datasets compared with state-of-the-art models, respectively; (2) Theoretical grounding: with our proposed RQL, LOSC bridges the gap between the graph theory and learning-based FBN analysis; and (3) Interpretability: the discovered functional clusters align with known neuropathology and contribute to the discovery of new functional community biomarkers.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1109/JBHI.2026.3663638
Zemin Cai, Jiarui Luo, Jian-Huang Lai, Fu Chen
Medical image segmentation plays a crucial role in intelligent medical image processing systems, serving as the foundation for effective medical image analysis, particularly in assisting diagnosis and surgical planning. Over the past few years, UNet has achieved tremendous success in the field of image segmentation, with several UNet-based extension models widely applied in medical image segmentation tasks. However, the application of these models is limited to scenarios where large medical equipment can be deployed, such as hospitals. The significant computational costs associated with these segmentation models pose significant challenges when deploying them on portable devices with limited hardware resources. This hinders the realization of rapid and efficient image segmentation in Homelab. In this paper, we present a lightweight model, RGShuffleNet, specifically designed for resource-constrained mobile devices for medical image segmentation. To reduce parameters and computational complexity, we first propose Reshaped Group Convolution, a novel convolutional method for effectively restructuring dimensions of different feature groups. Modifying the feature structure enhances correlations between different groups. Additionally, we introduce the MSC-Shuffle block to facilitate information flow between different feature groups. Unlike traditional Shuffle operations that focus solely on channel correlation, the MSC-Shuffle block proposed in this paper enables information exchange between different groups in both channel and spatial dimensions, thereby achieving superior segmentation performance. Experimental evaluations on two cardiac ultrasound image datasets and one chest CT image dataset demonstrate that RGShuffleNet achieves performance superior to various other state-of-the-art methods while maintaining lower complexity. Finally, RGShuffleNet is deployed on portable devices. The source code of the project is available at https://github.com/Zemin-Cai/RGShuffleNet.
{"title":"RGShuffleNet: An Efficient Design for Medical Image Segmentation on Portable Devices.","authors":"Zemin Cai, Jiarui Luo, Jian-Huang Lai, Fu Chen","doi":"10.1109/JBHI.2026.3663638","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663638","url":null,"abstract":"<p><p>Medical image segmentation plays a crucial role in intelligent medical image processing systems, serving as the foundation for effective medical image analysis, particularly in assisting diagnosis and surgical planning. Over the past few years, UNet has achieved tremendous success in the field of image segmentation, with several UNet-based extension models widely applied in medical image segmentation tasks. However, the application of these models is limited to scenarios where large medical equipment can be deployed, such as hospitals. The significant computational costs associated with these segmentation models pose significant challenges when deploying them on portable devices with limited hardware resources. This hinders the realization of rapid and efficient image segmentation in Homelab. In this paper, we present a lightweight model, RGShuffleNet, specifically designed for resource-constrained mobile devices for medical image segmentation. To reduce parameters and computational complexity, we first propose Reshaped Group Convolution, a novel convolutional method for effectively restructuring dimensions of different feature groups. Modifying the feature structure enhances correlations between different groups. Additionally, we introduce the MSC-Shuffle block to facilitate information flow between different feature groups. Unlike traditional Shuffle operations that focus solely on channel correlation, the MSC-Shuffle block proposed in this paper enables information exchange between different groups in both channel and spatial dimensions, thereby achieving superior segmentation performance. Experimental evaluations on two cardiac ultrasound image datasets and one chest CT image dataset demonstrate that RGShuffleNet achieves performance superior to various other state-of-the-art methods while maintaining lower complexity. Finally, RGShuffleNet is deployed on portable devices. The source code of the project is available at https://github.com/Zemin-Cai/RGShuffleNet.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-11DOI: 10.1109/JBHI.2026.3663725
Chong Wang, Li Yang, Bingfan Yuan, Jiafan Zhang, Chen Jin, Rong Li, Junjie Bu
Individual differences pose a significant challenge in brain-computer interface (BCI) research. Designing a universally applicable network architecture is impractical due to the variability in human brain structure and function. We propose Filter-Bank Neural Architecture Search (FBNAS), an EEG decoding framework that automates network architecture design for individuals. FBNAS uses three temporal cells to process different frequency EEG signals, with dilated convolution kernels in their search spaces. A multi-path NAS algorithm determines optimal architectures for multi-scale feature extraction. We benchmarked FBNAS on three EEG datasets across two BCI paradigms, comparing it to six state-of-the-art deep learning algorithms. FBNAS achieved cross-session decoding accuracies of 79.78%, 70.66%, and 68.38% on the BCIC-IV-2a, OpenBMI, and SEED datasets, respectively, outperforming other methods. Our results show that FBNAS customizes decoding models to address individual differences, enhancing decoding performance and shifting model design from expert-driven to machine-aided. The source code can be found at https://github.com/wang1239435478/FBNAS-master.
个体差异对脑机接口(BCI)研究提出了重大挑战。由于人类大脑结构和功能的可变性,设计一个普遍适用的网络架构是不切实际的。我们提出了滤波器库神经结构搜索(Filter-Bank Neural Architecture Search, FBNAS),这是一个脑电图解码框架,可以自动为个人设计网络结构。FBNAS使用三个时间细胞来处理不同频率的脑电图信号,在其搜索空间中使用扩展的卷积核。多路径NAS算法确定了多尺度特征提取的最优架构。我们在两种脑机接口范式的三个EEG数据集上对FBNAS进行了基准测试,并将其与六种最先进的深度学习算法进行了比较。FBNAS在bbic - iv -2a、OpenBMI和SEED数据集上的跨会话解码准确率分别为79.78%、70.66%和68.38%,优于其他方法。我们的研究结果表明,FBNAS定制解码模型以解决个体差异,提高解码性能,并将模型设计从专家驱动转向机器辅助。源代码可以在https://github.com/wang1239435478/FBNAS-master上找到。
{"title":"Subject-Adaptive EEG Decoding via Filter-Bank Neural Architecture Search for BCI Applications.","authors":"Chong Wang, Li Yang, Bingfan Yuan, Jiafan Zhang, Chen Jin, Rong Li, Junjie Bu","doi":"10.1109/JBHI.2026.3663725","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663725","url":null,"abstract":"<p><p>Individual differences pose a significant challenge in brain-computer interface (BCI) research. Designing a universally applicable network architecture is impractical due to the variability in human brain structure and function. We propose Filter-Bank Neural Architecture Search (FBNAS), an EEG decoding framework that automates network architecture design for individuals. FBNAS uses three temporal cells to process different frequency EEG signals, with dilated convolution kernels in their search spaces. A multi-path NAS algorithm determines optimal architectures for multi-scale feature extraction. We benchmarked FBNAS on three EEG datasets across two BCI paradigms, comparing it to six state-of-the-art deep learning algorithms. FBNAS achieved cross-session decoding accuracies of 79.78%, 70.66%, and 68.38% on the BCIC-IV-2a, OpenBMI, and SEED datasets, respectively, outperforming other methods. Our results show that FBNAS customizes decoding models to address individual differences, enhancing decoding performance and shifting model design from expert-driven to machine-aided. The source code can be found at https://github.com/wang1239435478/FBNAS-master.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hepatocellular carcinoma (HCC) is a major cause of cancer-related mortality, and microwave ablation (MWA) is commonly used for patients ineligible for surgical resection. A critical challenge following MWA is the assessment of the ablative margin, which is complicated by non-diffeomorphic deformations introduced by thermal effects during the procedure. This paper proposes a Multi-sequence Distance-guided Complementary Network (MDCNet) that utilizes multi-sequence MRI to quantify the extent of tumor contraction after MWA. To account for the differential contraction responses of liver parenchyma and tumor tissue, we propose a novel distance-aware mask transformation strategy. This method explicitly models the spatial attenuation of MWA energy and approximates the influence of liver parenchyma's linear elastic response on tumor shrinkage, thereby enhancing the spatial adaptiveness of feature weighting. To capture the distinct structural characteristics of liver tissue emphasized by different MRI sequences and to leverage their complementary information, a gated channel fusion module is introduced to dynamically integrate features from delayed-phase and T2-weighted images. To validate the practical effectiveness of our proposed method, we evaluate the ablative margins of 115 HCC patients using a fine-tuned TransMorph model that incorporated tumor contraction predictions generated by MDCNet, and compare the results with radiologist 2D assessments. The registration method enhanced with MDCNet improved tumor deformation accuracy and achieved a higher Youden Index in detecting incomplete ablations. Moreover, MDCNet provides interpretable predictions, thereby facilitating clinical decision support.
{"title":"Tumor Contraction-Aware Multi-Sequence MRI Framework for Accurate Post-Ablation Margin Assessment in Hepatocellular Carcinoma.","authors":"Linan Dong, Hongwei Ge, Jie Yu, Yong Luo, Jinming Hu, Shichen Yu, Ping Liang","doi":"10.1109/JBHI.2026.3663682","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663682","url":null,"abstract":"<p><p>Hepatocellular carcinoma (HCC) is a major cause of cancer-related mortality, and microwave ablation (MWA) is commonly used for patients ineligible for surgical resection. A critical challenge following MWA is the assessment of the ablative margin, which is complicated by non-diffeomorphic deformations introduced by thermal effects during the procedure. This paper proposes a Multi-sequence Distance-guided Complementary Network (MDCNet) that utilizes multi-sequence MRI to quantify the extent of tumor contraction after MWA. To account for the differential contraction responses of liver parenchyma and tumor tissue, we propose a novel distance-aware mask transformation strategy. This method explicitly models the spatial attenuation of MWA energy and approximates the influence of liver parenchyma's linear elastic response on tumor shrinkage, thereby enhancing the spatial adaptiveness of feature weighting. To capture the distinct structural characteristics of liver tissue emphasized by different MRI sequences and to leverage their complementary information, a gated channel fusion module is introduced to dynamically integrate features from delayed-phase and T2-weighted images. To validate the practical effectiveness of our proposed method, we evaluate the ablative margins of 115 HCC patients using a fine-tuned TransMorph model that incorporated tumor contraction predictions generated by MDCNet, and compare the results with radiologist 2D assessments. The registration method enhanced with MDCNet improved tumor deformation accuracy and achieved a higher Youden Index in detecting incomplete ablations. Moreover, MDCNet provides interpretable predictions, thereby facilitating clinical decision support.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146165263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-10DOI: 10.1109/JBHI.2026.3663420
Zefan Zhang, Yanhui Li, Ruihong Zhao, Tian Bai
Difference-aware Medical Visual Question Answering (MVQA) aims to answer questions regarding disease-related content and the visual differences between the paired medical images, which is crucial for assessing disease progression and guiding further treatment planning. Although current medical Multimodal Large Language Models (MLLMs) have shown promising results in MVQA, they still exhibit poor generalization performance in difference-aware MVQA due to two key challenges. Firstly, existing difference-aware MVQA datasets are biased toward temporal variations of individual diseases, limiting their ability to model multi-disease coexistence and overlapping symptoms in real-world clinical scenarios. Secondly, disease-level semantic alignment becomes more challenging with multi-image inputs, as they introduce more redundant and interfering visual features. To address the first challenge, we introduce DAMON-QA, a large-scale difference-aware MVQA dataset designed to support visual difference analysis across multiple diseases. Leveraging this dataset, we train MLLMs and propose a Difference-Aware Medical visual questiON answering (DAMON) model. To tackle the second challenge, we further propose a Disease-driven Prompt Module (DPM) to identify the relevant diseases and guide the disease difference analysis process. Experiments on MIMIC-Diff-VQA show that our DAMON model achieves state-of-the-art (SOTA) performance. The dataset and code can be found at https://github.com/zefanZhang-cn/DAMON.
差异感知医学视觉问答(MVQA)旨在回答有关疾病相关内容和配对医学图像之间视觉差异的问题,这对于评估疾病进展和指导进一步的治疗计划至关重要。尽管目前的医学多模态大语言模型(Multimodal Large Language Models, MLLMs)在多模态大语言模型质量评价(MVQA)中取得了令人鼓舞的成果,但由于两个关键的挑战,它们在差异感知多模态大语言模型质量评价中仍然表现出较差的泛化性能。首先,现有的差异感知MVQA数据集偏向于个体疾病的时间变化,限制了它们在真实临床场景中模拟多疾病共存和重叠症状的能力。其次,对于多图像输入,疾病级语义对齐变得更具挑战性,因为它们引入了更多冗余和干扰的视觉特征。为了解决第一个挑战,我们引入了DAMON-QA,这是一个大规模的差异感知MVQA数据集,旨在支持跨多种疾病的视觉差异分析。利用该数据集,我们训练了mllm,并提出了一个差异感知医学视觉问答(DAMON)模型。为了解决第二个挑战,我们进一步提出了疾病驱动提示模块(disease -driven Prompt Module, DPM)来识别相关疾病并指导疾病差异分析过程。在MIMIC-Diff-VQA上的实验表明,我们的DAMON模型达到了最先进(SOTA)的性能。数据集和代码可以在https://github.com/zefanZhang-cn/DAMON上找到。
{"title":"DAMON: Difference-Aware Medical Visual Question Answering via Multimodal Large Language Model.","authors":"Zefan Zhang, Yanhui Li, Ruihong Zhao, Tian Bai","doi":"10.1109/JBHI.2026.3663420","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3663420","url":null,"abstract":"<p><p>Difference-aware Medical Visual Question Answering (MVQA) aims to answer questions regarding disease-related content and the visual differences between the paired medical images, which is crucial for assessing disease progression and guiding further treatment planning. Although current medical Multimodal Large Language Models (MLLMs) have shown promising results in MVQA, they still exhibit poor generalization performance in difference-aware MVQA due to two key challenges. Firstly, existing difference-aware MVQA datasets are biased toward temporal variations of individual diseases, limiting their ability to model multi-disease coexistence and overlapping symptoms in real-world clinical scenarios. Secondly, disease-level semantic alignment becomes more challenging with multi-image inputs, as they introduce more redundant and interfering visual features. To address the first challenge, we introduce DAMON-QA, a large-scale difference-aware MVQA dataset designed to support visual difference analysis across multiple diseases. Leveraging this dataset, we train MLLMs and propose a Difference-Aware Medical visual questiON answering (DAMON) model. To tackle the second challenge, we further propose a Disease-driven Prompt Module (DPM) to identify the relevant diseases and guide the disease difference analysis process. Experiments on MIMIC-Diff-VQA show that our DAMON model achieves state-of-the-art (SOTA) performance. The dataset and code can be found at https://github.com/zefanZhang-cn/DAMON.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146157166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}