首页 > 最新文献

IEEE Journal of Biomedical and Health Informatics最新文献

英文 中文
Bond-Aware Molecular Graph Learning With Multi-Graph Interleaved Message Passing. 基于多图交错信息传递的键感知分子图学习。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3668790
Honghao Wang, Hongrui Zhang, Acong Zhang, Junlei Tang, Ping Li

Graph neural networks (GNNs) have demonstrated remarkable capabilities in molecular property prediction. Existing approaches adopt GNNs by modeling molecules as homogeneous graphs. However, the bonds between atoms can be heterogeneous, whose characterization and role in molecular graph representation learning remain unexplored. To address the heterogeneity issue inherent in molecular graphs, in this work, we build the bond-centric graphs and propose a novel multi-graph learning model, which captures the bond heterogeneity via augmented bond graph view and bond coding for atom features. Different from conventional multi-view learning that focus on late-stage view fusion, our method integrates cross-graph information during the node representation learning phase. Towards this end, we introduce the interleaved message passing graph neural network (IMPGNN), allowing the messages passing across three views of the molecular graph. Moreover, we introduce a novel structure-aware pooling mechanims for graph representation, which yields up to 45.7% gains over simple sum pooling. Comparative experiments on two standard molecular property prediction tasks reveal that our method surpasses all competing approaches (including multimodal models) on 75% of the evaluated benchmark datasets.

图神经网络(gnn)在分子性质预测方面表现出了显著的能力。现有方法通过将分子建模为齐次图来采用gnn。然而,原子之间的键可以是异质的,其表征和在分子图表示学习中的作用仍未被探索。为了解决分子图中固有的异构性问题,在这项工作中,我们构建了以键为中心的图,并提出了一种新的多图学习模型,该模型通过增强键图视图和原子特征的键编码来捕获键的异构性。与传统的多视图学习集中在后期视图融合不同,我们的方法在节点表示学习阶段集成了交叉图信息。为此,我们引入了交错消息传递图神经网络(IMPGNN),允许消息通过分子图的三个视图传递。此外,我们为图表示引入了一种新的结构感知池机制,与简单的和池相比,它的收益高达45.7%。在两个标准分子性质预测任务上的对比实验表明,我们的方法在75%的评估基准数据集上优于所有竞争方法(包括多模态模型)。
{"title":"Bond-Aware Molecular Graph Learning With Multi-Graph Interleaved Message Passing.","authors":"Honghao Wang, Hongrui Zhang, Acong Zhang, Junlei Tang, Ping Li","doi":"10.1109/JBHI.2026.3668790","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3668790","url":null,"abstract":"<p><p>Graph neural networks (GNNs) have demonstrated remarkable capabilities in molecular property prediction. Existing approaches adopt GNNs by modeling molecules as homogeneous graphs. However, the bonds between atoms can be heterogeneous, whose characterization and role in molecular graph representation learning remain unexplored. To address the heterogeneity issue inherent in molecular graphs, in this work, we build the bond-centric graphs and propose a novel multi-graph learning model, which captures the bond heterogeneity via augmented bond graph view and bond coding for atom features. Different from conventional multi-view learning that focus on late-stage view fusion, our method integrates cross-graph information during the node representation learning phase. Towards this end, we introduce the interleaved message passing graph neural network (IMPGNN), allowing the messages passing across three views of the molecular graph. Moreover, we introduce a novel structure-aware pooling mechanims for graph representation, which yields up to 45.7% gains over simple sum pooling. Comparative experiments on two standard molecular property prediction tasks reveal that our method surpasses all competing approaches (including multimodal models) on 75% of the evaluated benchmark datasets.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Dynamic Granger Causality based on Multi-Space Spectrum Fusion for Time-varying Directed Brain Network Construction. 基于多空间频谱融合的非参数动态格兰杰因果关系时变定向脑网络构建。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3670140
Chanlin Yi, Jiamin Zhang, Zihan Weng, Wanjun Chen, Dezhong Yao, Fali Li, Zehong Cao, Peiyang Li, Peng Xu

Nonparametric estimation of time-varying directed networks can unveil the intricate transient organization of directed brain communication while circumventing constraints imposed by prescribed model driven methods. A robust time-frequency representation - the foundation of its causality inference - is critical for enhancing its reliability. This study proposed a novel method, i.e., nonparametric dynamic Granger causality based on Multi-space Spectrum Fusion (ndGCMSF), which integrates complementary spectrum information from different spaces to generate enhanced spectral representations to estimate dynamic causalities across brain regions. Systematic simulations and validations demonstrate that ndGCMSF exhibits superior noise resistance and a powerful ability to capture subtle dynamic changes in directed brain networks. Particularly, ndGCMSF revealed that during motor imagery, the laterality in the hemisphere ipsilateral to the hemiplegic limb emerges upon task beginning and diminishes upon task accomplishment. These intrinsic variations further provide features for assessing motor functions. The ndGCMSF offers powerful functional patterns to derive effective brain networks in dynamically changing operational settings and contributes to extensive areas involving dynamical and directed communications.

时变有向网络的非参数估计可以揭示有向脑通信的复杂瞬态组织,同时规避了规定的模型驱动方法所施加的约束。鲁棒的时频表示是其因果推理的基础,是提高其可靠性的关键。本研究提出了一种基于多空间频谱融合的非参数动态格兰杰因果关系方法(ndGCMSF),该方法将来自不同空间的互补频谱信息集成在一起,生成增强的频谱表示,以估计跨脑区动态因果关系。系统仿真和验证表明,ndGCMSF具有优异的抗噪声性能和捕获定向脑网络中细微动态变化的强大能力。特别是,ndGCMSF显示,在运动想象过程中,偏瘫肢体同侧半球的侧性在任务开始时出现,在任务完成时减弱。这些内在变化进一步为评估运动功能提供了特征。ndGCMSF提供了强大的功能模式,可以在动态变化的操作环境中获得有效的大脑网络,并有助于涉及动态和定向通信的广泛领域。
{"title":"Nonparametric Dynamic Granger Causality based on Multi-Space Spectrum Fusion for Time-varying Directed Brain Network Construction.","authors":"Chanlin Yi, Jiamin Zhang, Zihan Weng, Wanjun Chen, Dezhong Yao, Fali Li, Zehong Cao, Peiyang Li, Peng Xu","doi":"10.1109/JBHI.2026.3670140","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3670140","url":null,"abstract":"<p><p>Nonparametric estimation of time-varying directed networks can unveil the intricate transient organization of directed brain communication while circumventing constraints imposed by prescribed model driven methods. A robust time-frequency representation - the foundation of its causality inference - is critical for enhancing its reliability. This study proposed a novel method, i.e., nonparametric dynamic Granger causality based on Multi-space Spectrum Fusion (ndGCMSF), which integrates complementary spectrum information from different spaces to generate enhanced spectral representations to estimate dynamic causalities across brain regions. Systematic simulations and validations demonstrate that ndGCMSF exhibits superior noise resistance and a powerful ability to capture subtle dynamic changes in directed brain networks. Particularly, ndGCMSF revealed that during motor imagery, the laterality in the hemisphere ipsilateral to the hemiplegic limb emerges upon task beginning and diminishes upon task accomplishment. These intrinsic variations further provide features for assessing motor functions. The ndGCMSF offers powerful functional patterns to derive effective brain networks in dynamically changing operational settings and contributes to extensive areas involving dynamical and directed communications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reprogramming Automatic Speech Recognition Models for Neonatal Chest Sound Separation. 新生儿胸音分离自动语音识别模型的重新编程。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669531
Yang Yi Poh, Ethan Grooby, Kenneth Tan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad

Stethoscope-recorded chest sounds are invaluable for providing a non-invasive, real-time assessment of heart and lung sounds. However, noisy chest sounds can affect the dependability of various algorithms that rely on clean chest sounds, making additional preprocessing necessary to isolate the desired sources and remove noise, interference, and artifacts. This paper is the first to explore the reprogramming of automatic speech recognition (ASR) models to perform neonatal chest sound separation. In particular, we reprogrammed the Whisper ASR model to perform chest sound separation. We proposed two approaches: reprogramming just Whisper's audio encoder and reprogramming the full Whisper model. Using only simple linear layers and learnable parameters, we showed that this parameter-efficient method of reprogramming Whisper effectively separates heart and lung sounds from noise on the artificial dataset. We also demonstrate the effectiveness of using the proposed method as a preprocessing step for various heart and lung sound algorithms, yielding results comparable to state-of-the-art performance. Applying the pre-trained ASR model to perform sound separation demonstrates the feasibility of efficient cross-domain model reprogramming, demonstrating the feasibility of using frozen cross-domain foundational models from a different domain on biomedical data.

听诊器记录的胸音对于提供无创、实时的心肺音评估是非常宝贵的。然而,嘈杂的胸音会影响依赖于干净胸音的各种算法的可靠性,因此需要进行额外的预处理,以隔离所需的源并去除噪声、干扰和伪影。本文首次探讨了将自动语音识别(ASR)模型重新编程用于新生儿胸音分离。特别是,我们重新编程耳语ASR模型来执行胸音分离。我们提出了两种方法:重新编程耳语的音频编码器和重新编程整个耳语模型。仅使用简单的线性层和可学习的参数,我们证明了这种参数高效的重编程方法可以有效地将人工数据集上的心肺声音与噪声分离开来。我们还证明了使用所提出的方法作为各种心肺声音算法的预处理步骤的有效性,产生的结果可与最先进的性能相媲美。应用预训练的ASR模型进行声音分离证明了有效的跨域模型重编程的可行性,证明了在生物医学数据上使用来自不同领域的冻结跨域基础模型的可行性。
{"title":"Reprogramming Automatic Speech Recognition Models for Neonatal Chest Sound Separation.","authors":"Yang Yi Poh, Ethan Grooby, Kenneth Tan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad","doi":"10.1109/JBHI.2026.3669531","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669531","url":null,"abstract":"<p><p>Stethoscope-recorded chest sounds are invaluable for providing a non-invasive, real-time assessment of heart and lung sounds. However, noisy chest sounds can affect the dependability of various algorithms that rely on clean chest sounds, making additional preprocessing necessary to isolate the desired sources and remove noise, interference, and artifacts. This paper is the first to explore the reprogramming of automatic speech recognition (ASR) models to perform neonatal chest sound separation. In particular, we reprogrammed the Whisper ASR model to perform chest sound separation. We proposed two approaches: reprogramming just Whisper's audio encoder and reprogramming the full Whisper model. Using only simple linear layers and learnable parameters, we showed that this parameter-efficient method of reprogramming Whisper effectively separates heart and lung sounds from noise on the artificial dataset. We also demonstrate the effectiveness of using the proposed method as a preprocessing step for various heart and lung sound algorithms, yielding results comparable to state-of-the-art performance. Applying the pre-trained ASR model to perform sound separation demonstrates the feasibility of efficient cross-domain model reprogramming, demonstrating the feasibility of using frozen cross-domain foundational models from a different domain on biomedical data.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPFD-Net: A Geometry-Pose Frequency Decoupling Network for Privacy-Preserving Human Action Recognition in Healthcare. GPFD-Net:用于医疗保健中保护隐私的人体动作识别的几何-姿态频率解耦网络。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669251
Xing Li, Jingfan Liang, Ge Gao, Li Wang, Haifeng Wang, Shihao Han

Human Action Recognition (HAR) holds significant application value in healthcare informatics, facilitating tasks such as clinical diagnosis and rehabilitation monitoring. Point cloud sequences have emerged as a pivotal modality for balancing privacy preservation with high-fidelity geometric structural representation, ensuring anonymity while retaining critical 3D behavioral information. However, existing point cloud sequence encoding methods struggle to precisely encode micro-geometric details and macro-pose contours within the spatial dimension, as well as the dynamic heterogeneity of actions within the temporal dimension. These limitations impede the realization of high-precision clinical motion analysis. To address these challenges, we propose a Geometry-Pose Frequency Decoupling Network (GPFD-Net) for human action recognition. First, we design a Geometry-Pose Parallel-Collaborative Spatial Encoder (GPCSE). This module designs a parallel dual-stream architecture to explicitly capture and fuse complementary micro-geometric details and macro-pose contours, generating an informative geometry-enhanced pose feature sequence. Second, we introduce a Frequency-Decoupled Temporal Capturer (FDTC). This module adaptively decomposes the geometry-enhanced pose feature sequence into a smooth trend sequence and a transient detail sequence, which are subsequently processed by two parallel expert encoders via differentiated encoding to achieve robust human action recognition. Extensive experiments on four public benchmark datasets demonstrate that GPFD-Net achieves superior performance. The proposed method provides a novel paradigm for high-precision and privacy-preserving motion analysis in healthcare applications.

人体行为识别(HAR)在医疗信息学中具有重要的应用价值,可以促进临床诊断和康复监测等任务。点云序列已成为平衡隐私保护与高保真几何结构表示的关键模式,在保留关键3D行为信息的同时确保匿名。然而,现有的点云序列编码方法难以在空间维度上精确编码微观几何细节和宏观姿态轮廓,以及在时间维度上精确编码动作的动态异质性。这些限制阻碍了高精度临床运动分析的实现。为了解决这些挑战,我们提出了一种用于人体动作识别的几何-姿态频率解耦网络(GPFD-Net)。首先,我们设计了一个几何位姿并行协同空间编码器(GPCSE)。该模块设计了一个并行的双流架构,以明确地捕获和融合互补的微观几何细节和宏观姿态轮廓,生成信息丰富的几何增强姿态特征序列。其次,我们引入了频率解耦时间捕获器(FDTC)。该模块将几何增强的姿态特征序列自适应分解为平滑趋势序列和瞬态细节序列,随后由两个并行专家编码器进行差分编码处理,实现鲁棒人体动作识别。在四个公共基准数据集上的大量实验表明,GPFD-Net具有优越的性能。该方法为医疗保健应用中的高精度和隐私保护运动分析提供了一种新的范例。
{"title":"GPFD-Net: A Geometry-Pose Frequency Decoupling Network for Privacy-Preserving Human Action Recognition in Healthcare.","authors":"Xing Li, Jingfan Liang, Ge Gao, Li Wang, Haifeng Wang, Shihao Han","doi":"10.1109/JBHI.2026.3669251","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669251","url":null,"abstract":"<p><p>Human Action Recognition (HAR) holds significant application value in healthcare informatics, facilitating tasks such as clinical diagnosis and rehabilitation monitoring. Point cloud sequences have emerged as a pivotal modality for balancing privacy preservation with high-fidelity geometric structural representation, ensuring anonymity while retaining critical 3D behavioral information. However, existing point cloud sequence encoding methods struggle to precisely encode micro-geometric details and macro-pose contours within the spatial dimension, as well as the dynamic heterogeneity of actions within the temporal dimension. These limitations impede the realization of high-precision clinical motion analysis. To address these challenges, we propose a Geometry-Pose Frequency Decoupling Network (GPFD-Net) for human action recognition. First, we design a Geometry-Pose Parallel-Collaborative Spatial Encoder (GPCSE). This module designs a parallel dual-stream architecture to explicitly capture and fuse complementary micro-geometric details and macro-pose contours, generating an informative geometry-enhanced pose feature sequence. Second, we introduce a Frequency-Decoupled Temporal Capturer (FDTC). This module adaptively decomposes the geometry-enhanced pose feature sequence into a smooth trend sequence and a transient detail sequence, which are subsequently processed by two parallel expert encoders via differentiated encoding to achieve robust human action recognition. Extensive experiments on four public benchmark datasets demonstrate that GPFD-Net achieves superior performance. The proposed method provides a novel paradigm for high-precision and privacy-preserving motion analysis in healthcare applications.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiagR1: A Vision-Language Model Trained via Reinforcement Learning for Digestive Pathology Diagnosis. 图1:通过强化学习训练的用于消化道病理诊断的视觉语言模型。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669866
Minxi Ouyang, Lianghui Zhu, Yaqing Bao, Qiang Huang, Jingli Ouyang, Tian Guan, Xitong Ling, Jiawen Li, Song Duan, Wenbin Dai, Li Zheng, Xuemei Zhang, Yonghong He

Multimodal large models have shown great potential in automating pathology image analysis. However, current multimodal models for gastrointestinal pathology are constrained by both data quality and reasoning transparency: pervasive noise and incomplete annotations in public datasets predispose vision-language models to factual hallucinations when generating diagnostic text, while the absence of explicit intermediate reasoning chains renders the outputs difficult to audit and thus less trustworthy in clinical practice. To address these issues, we construct a large-scale gastrointestinal pathology dataset containing both microscopic descriptions and diagnostic conclusions, and propose a prompt augmentation strategy that incorporates lesion classification and anatomical site information. This design guides the model to better capture image-specific features and maintain semantic consistency in generation. Furthermore, we employ a post-training pipeline that combines supervised fine-tuning with Group Relative Policy Optimization (GRPO) to improve reasoning quality and output structure. Experimental results on real-world pathology report generation tasks demonstrate that our approach significantly outperforms state-of-the-art open-source and proprietary baselines in terms of generation quality, structural completeness, and clinical relevance. Our solution outperforms state-of-the-art models with 18.7% higher clinical relevance, 32.4% improved structural completeness, and 41.2% fewer diagnostic errors, demonstrating superior accuracy and clinical utility compared to existing solutions.

多模态大模型在病理图像自动化分析中显示出巨大的潜力。然而,目前胃肠道病理学的多模态模型受到数据质量和推理透明度的限制:公共数据集中普遍存在的噪声和不完整的注释使视觉语言模型在生成诊断文本时容易产生事实幻觉,而缺乏明确的中间推理链使得输出难以审计,因此在临床实践中不那么可信。为了解决这些问题,我们构建了一个包含微观描述和诊断结论的大规模胃肠道病理数据集,并提出了一种结合病变分类和解剖部位信息的快速增强策略。这种设计指导模型更好地捕获特定于图像的特征,并在生成过程中保持语义一致性。此外,我们采用了一种训练后管道,将监督微调与群体相对策略优化(GRPO)相结合,以提高推理质量和输出结构。真实世界病理报告生成任务的实验结果表明,我们的方法在生成质量、结构完整性和临床相关性方面明显优于最先进的开源和专有基线。与现有的解决方案相比,我们的解决方案的临床相关性提高了18.7%,结构完整性提高了32.4%,诊断错误减少了41.2%,证明了更高的准确性和临床实用性。
{"title":"DiagR1: A Vision-Language Model Trained via Reinforcement Learning for Digestive Pathology Diagnosis.","authors":"Minxi Ouyang, Lianghui Zhu, Yaqing Bao, Qiang Huang, Jingli Ouyang, Tian Guan, Xitong Ling, Jiawen Li, Song Duan, Wenbin Dai, Li Zheng, Xuemei Zhang, Yonghong He","doi":"10.1109/JBHI.2026.3669866","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669866","url":null,"abstract":"<p><p>Multimodal large models have shown great potential in automating pathology image analysis. However, current multimodal models for gastrointestinal pathology are constrained by both data quality and reasoning transparency: pervasive noise and incomplete annotations in public datasets predispose vision-language models to factual hallucinations when generating diagnostic text, while the absence of explicit intermediate reasoning chains renders the outputs difficult to audit and thus less trustworthy in clinical practice. To address these issues, we construct a large-scale gastrointestinal pathology dataset containing both microscopic descriptions and diagnostic conclusions, and propose a prompt augmentation strategy that incorporates lesion classification and anatomical site information. This design guides the model to better capture image-specific features and maintain semantic consistency in generation. Furthermore, we employ a post-training pipeline that combines supervised fine-tuning with Group Relative Policy Optimization (GRPO) to improve reasoning quality and output structure. Experimental results on real-world pathology report generation tasks demonstrate that our approach significantly outperforms state-of-the-art open-source and proprietary baselines in terms of generation quality, structural completeness, and clinical relevance. Our solution outperforms state-of-the-art models with 18.7% higher clinical relevance, 32.4% improved structural completeness, and 41.2% fewer diagnostic errors, demonstrating superior accuracy and clinical utility compared to existing solutions.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MuST: Multi-Scale Transformer Incorporating Hierarchical Attention and TCN for EEG Decoding. 基于层次注意和TCN的多尺度变压器EEG译码。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669898
Kui Zhao, Enze Shi, Di Zhu, Sigang Yu, Geng Chen, Shijie Zhao, Dingwen Zhang, Shu Zhang

Electroencephalography (EEG) signals exhibit significant and inherent time scales differences across individuals and tasks. Despite notable successes in decoding EEG signals in single-tasks (e.g., detection of epilepsy), where the time scales are relatively consistent, substantial differences in temporal characteristics among various tasks pose a significant challenge. To address these limitations, we propose the MuST, which stands for Multi-Scale Transformer, aiming to dynamically learn characteristics of EEG signals on different time scales. Building on the conventional Convolutional Neural Network (CNN)-Transformer model, the MuST introduces two innovations: (1) A hierarchical Transformer structure to dynamically capture global dependencies and long-range information from EEG signals at different scales. (2) A novel temporal convolutional network (TCN) module to replace the original feed forward network (FFN) module in the Transformer, effectively capturing local temporal patterns and short-term dependencies from EEG signals. To validate the performance of the MuST, we conducted experiments on five public EEG datasets with extreme time-scale differences. The experimental results on these datasets demonstrate that we have achieved an average classification accuracy of 91.69% under identical parameter settings. This surpasses the baseline EEGNet by 5.65%, highlighting its superior capability in handling multi-scale EEG signals for diverse tasks. More critically, MuST demonstrates a successful unified modeling of EEG temporal heterogeneity through mixed dataset training (epilepsy detection and sleep staging classification). This breakthrough validates our multi-scale architecture's capability to dynamically reconcile divergent neurophysiological timescales within a single model. Our code can be found at https://github.com/wisercc/MuST.

脑电图(EEG)信号在个体和任务之间表现出显著和固有的时间尺度差异。尽管在单一任务(例如癫痫检测)中解码EEG信号取得了显著成功,其中时间尺度相对一致,但不同任务之间时间特征的巨大差异构成了重大挑战。为了解决这些限制,我们提出了多尺度变压器(MuST),旨在动态学习不同时间尺度上脑电信号的特征。在传统卷积神经网络(CNN)-变压器模型的基础上,MuST引入了两个创新:(1)一个分层变压器结构,从不同尺度的脑电图信号中动态捕获全局依赖关系和远程信息。(2)用一种新颖的时间卷积网络(TCN)模块取代变压器中原有的前馈网络(FFN)模块,有效捕获脑电信号的局部时间模式和短期依赖关系。为了验证该方法的性能,我们在5个具有极端时间尺度差异的公开EEG数据集上进行了实验。在这些数据集上的实验结果表明,在相同的参数设置下,我们的平均分类准确率达到了91.69%。这比基线EEGNet高出5.65%,突出了其在处理不同任务的多尺度EEG信号方面的优越能力。更重要的是,MuST通过混合数据集训练(癫痫检测和睡眠分期分类)成功地实现了EEG时间异质性的统一建模。这一突破验证了我们的多尺度架构在单一模型中动态协调不同神经生理时间尺度的能力。我们的代码可以在https://github.com/wisercc/MuST上找到。
{"title":"MuST: Multi-Scale Transformer Incorporating Hierarchical Attention and TCN for EEG Decoding.","authors":"Kui Zhao, Enze Shi, Di Zhu, Sigang Yu, Geng Chen, Shijie Zhao, Dingwen Zhang, Shu Zhang","doi":"10.1109/JBHI.2026.3669898","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669898","url":null,"abstract":"<p><p>Electroencephalography (EEG) signals exhibit significant and inherent time scales differences across individuals and tasks. Despite notable successes in decoding EEG signals in single-tasks (e.g., detection of epilepsy), where the time scales are relatively consistent, substantial differences in temporal characteristics among various tasks pose a significant challenge. To address these limitations, we propose the MuST, which stands for Multi-Scale Transformer, aiming to dynamically learn characteristics of EEG signals on different time scales. Building on the conventional Convolutional Neural Network (CNN)-Transformer model, the MuST introduces two innovations: (1) A hierarchical Transformer structure to dynamically capture global dependencies and long-range information from EEG signals at different scales. (2) A novel temporal convolutional network (TCN) module to replace the original feed forward network (FFN) module in the Transformer, effectively capturing local temporal patterns and short-term dependencies from EEG signals. To validate the performance of the MuST, we conducted experiments on five public EEG datasets with extreme time-scale differences. The experimental results on these datasets demonstrate that we have achieved an average classification accuracy of 91.69% under identical parameter settings. This surpasses the baseline EEGNet by 5.65%, highlighting its superior capability in handling multi-scale EEG signals for diverse tasks. More critically, MuST demonstrates a successful unified modeling of EEG temporal heterogeneity through mixed dataset training (epilepsy detection and sleep staging classification). This breakthrough validates our multi-scale architecture's capability to dynamically reconcile divergent neurophysiological timescales within a single model. Our code can be found at https://github.com/wisercc/MuST.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mamba-Based Prototypical Contrastive Learning With Augmented Feature Separation for Common and Rare Arrhythmia Classification. 基于曼巴的原型对比学习与增强特征分离用于常见和罕见心律失常分类。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669893
Fengyi Guo, Ying An, Jianxin Wang

Early diagnosis of arrhythmia, a common cardiovascular condition, is crucial for improving prognosis. Electrocardiogram (ECG) is widely used as a non-invasive diagnostic tool. However, Computer-Aided Diagnosis of rare arrhythmias faces significant challenges due to the severe scarcity of samples for these rare disease classes. To tackle this, we propose a Mamba-based Prototypical Contrastive Learning framework, which can simultaneously identify both common and rare classes under the setting of generalized Few-Shot Learning (FSL). It primarily consists of: (1) the Mamba-based Spatio-Temporal Feature Fusion Network (MST), which integrates spatial features from multi-scale convolutions and temporal dynamics from bidirectional Mamba for ECG modeling; (2) the Prototypical Contrastive Learning framework with Augmented Feature Separation (PCAS), which employs a prototype augmentation strategy with an Augmented Prototype Consistency Loss to optimize prototype representations, and an Separation-Tuned Contrastive Loss to enhance intra-class compactness and inter-class distinctnessy, mitigating the risk of class collapse. Extensive experiments on publicly available datasets PTBXL and Chapman demonstrate the effectiveness of MST-PCAS, achieving superior rare-class recognition accuracies of 79.13% and 50.72%, respectively, for ECG arrhythmia classification.

心律失常是一种常见的心血管疾病,早期诊断对改善预后至关重要。心电图(ECG)作为一种无创诊断工具被广泛使用。然而,由于这些罕见疾病类别的样本严重缺乏,罕见心律失常的计算机辅助诊断面临着重大挑战。为了解决这个问题,我们提出了一个基于mamba的原型对比学习框架,该框架可以在广义少次学习(generalized Few-Shot Learning, FSL)的设置下同时识别常见类和稀有类。主要包括:(1)基于曼巴的时空特征融合网络(MST),该网络将多尺度卷积的空间特征与双向曼巴的时间动态相结合,用于心电建模;(2)基于增强特征分离的原型对比学习框架(PCAS),该框架采用增强原型一致性损失的原型增强策略来优化原型表示,采用分离调谐的对比损失策略来增强类内紧凑性和类间差异性,降低类崩溃的风险。在公开数据集PTBXL和Chapman上进行的大量实验证明了MST-PCAS的有效性,在ECG心律失常分类方面,MST-PCAS的罕见类识别准确率分别达到79.13%和50.72%。
{"title":"Mamba-Based Prototypical Contrastive Learning With Augmented Feature Separation for Common and Rare Arrhythmia Classification.","authors":"Fengyi Guo, Ying An, Jianxin Wang","doi":"10.1109/JBHI.2026.3669893","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669893","url":null,"abstract":"<p><p>Early diagnosis of arrhythmia, a common cardiovascular condition, is crucial for improving prognosis. Electrocardiogram (ECG) is widely used as a non-invasive diagnostic tool. However, Computer-Aided Diagnosis of rare arrhythmias faces significant challenges due to the severe scarcity of samples for these rare disease classes. To tackle this, we propose a Mamba-based Prototypical Contrastive Learning framework, which can simultaneously identify both common and rare classes under the setting of generalized Few-Shot Learning (FSL). It primarily consists of: (1) the Mamba-based Spatio-Temporal Feature Fusion Network (MST), which integrates spatial features from multi-scale convolutions and temporal dynamics from bidirectional Mamba for ECG modeling; (2) the Prototypical Contrastive Learning framework with Augmented Feature Separation (PCAS), which employs a prototype augmentation strategy with an Augmented Prototype Consistency Loss to optimize prototype representations, and an Separation-Tuned Contrastive Loss to enhance intra-class compactness and inter-class distinctnessy, mitigating the risk of class collapse. Extensive experiments on publicly available datasets PTBXL and Chapman demonstrate the effectiveness of MST-PCAS, achieving superior rare-class recognition accuracies of 79.13% and 50.72%, respectively, for ECG arrhythmia classification.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoMIL: A Contrastive CNN-Transformer Framework with Multi-Instance Learning for Whole-Slide Pathology Image Classification. CoMIL:一种带有多实例学习的对比CNN-Transformer框架,用于全切片病理图像分类。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669663
Bowen Liu, Hongbo Zhu, Xiaotong Wei, Chuan Lin, Wei Wang

Whole slide image (WSI) classification faces challenges due to gigapixel scale and weak supervision, often struggling to balance global context with local details. We propose CoMIL, a dual-branch framework based on symmetric mutual learning. Firstly, to resolve the dilemma where single-stream networks struggle to simultaneously capture global context and fine-grained details, we employ dual parallel pathways: a Transformer branch models long-range instance dependencies, while a CNN branch captures localized tissue morphology. Secondly, to address spatial information loss, we design a Hyper Positional Generator (HyperPG). This module integrates multi-scale adaptive mechanisms with deformable convolutions, enhancing spatial awareness with linear complexity. Finally, to improve model robustness against weak label noise, bidirectional learning between branches is achieved through KL divergence minimization. Extensive experiments show that our proposed method achieves an area under the curve of 98.6% and an accuracy of 95.3% on the Camelyon16 dataset, and an area under the curve of 98.8% and an accuracy of 93.3% on the TCGA_Kidney dataset, surpassing the performance of known advanced WSI classification methods.

全幻灯片图像(WSI)分类面临着千兆像素规模和弱监督的挑战,往往难以平衡全局背景和局部细节。我们提出了一种基于对称互学习的双分支框架CoMIL。首先,为了解决单流网络难以同时捕获全局上下文和细粒度细节的难题,我们采用了双并行路径:Transformer分支建模远程实例依赖关系,而CNN分支捕获局部组织形态。其次,为了解决空间信息丢失问题,我们设计了一个超位置生成器(HyperPG)。该模块集成了可变形卷积的多尺度自适应机制,增强了线性复杂性的空间感知能力。最后,为了提高模型对弱标签噪声的鲁棒性,通过KL散度最小化实现分支之间的双向学习。大量实验表明,该方法在Camelyon16数据集上的曲线下面积为98.6%,准确率为95.3%,在TCGA_Kidney数据集上的曲线下面积为98.8%,准确率为93.3%,超过了已知的高级WSI分类方法的性能。
{"title":"CoMIL: A Contrastive CNN-Transformer Framework with Multi-Instance Learning for Whole-Slide Pathology Image Classification.","authors":"Bowen Liu, Hongbo Zhu, Xiaotong Wei, Chuan Lin, Wei Wang","doi":"10.1109/JBHI.2026.3669663","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669663","url":null,"abstract":"<p><p>Whole slide image (WSI) classification faces challenges due to gigapixel scale and weak supervision, often struggling to balance global context with local details. We propose CoMIL, a dual-branch framework based on symmetric mutual learning. Firstly, to resolve the dilemma where single-stream networks struggle to simultaneously capture global context and fine-grained details, we employ dual parallel pathways: a Transformer branch models long-range instance dependencies, while a CNN branch captures localized tissue morphology. Secondly, to address spatial information loss, we design a Hyper Positional Generator (HyperPG). This module integrates multi-scale adaptive mechanisms with deformable convolutions, enhancing spatial awareness with linear complexity. Finally, to improve model robustness against weak label noise, bidirectional learning between branches is achieved through KL divergence minimization. Extensive experiments show that our proposed method achieves an area under the curve of 98.6% and an accuracy of 95.3% on the Camelyon16 dataset, and an area under the curve of 98.8% and an accuracy of 93.3% on the TCGA_Kidney dataset, surpassing the performance of known advanced WSI classification methods.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-Informed and FiLM-Enhanced Multimodal Fusion for Myocardial Infarction Prediction. 基于图像和膜增强的多模态融合预测心肌梗死。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669222
Xiantong Xiang, Longxiao Gao, Yuansheng Liu, Ningji Gong, Yongshun Gong

Accurate and timely diagnosis of cardiovascular diseases, particularly myocardial infarction (MI), remains a critical clinical challenge. Existing electrocardiogram (ECG) analysis methods often rely solely on a single data modality, such as raw signals or waveform images, which limits their ability to capture the broader physiological context. To address this limitation, we propose GFM-MIP, a Graph-informed and FiLM-enhanced Multimodal Fusion framework for myocardial infarction prediction. GFM-MIP integrates 12-lead ECG time-series signals, ECG images, and laboratory test results through a unified architecture. Specifically, it employs a Graphormer encoder to model inter-lead dependencies in ECG signals and a Vision Transformer to extract morphological patterns from ECG images, both modulated by patient-specific laboratory features using Feature-wise Linear Modulation (FiLM). A Transformer-based fusion module captures cross-modal interactions, while a contrastive learning objective encourages alignment between signal and image modalities. Experimental results on a real-world clinical dataset and three public benchmarks demonstrate that GFM-MIP consistently outperforms state-of-the-art baselines across multiple evaluation metrics. Ablation studies further validate the contribution of each modality and architectural component. The proposed framework offers a clinically meaningful and scalable solution for robust, multimodal cardiovascular diagnosis.

准确和及时诊断心血管疾病,特别是心肌梗死(MI),仍然是一个关键的临床挑战。现有的心电图(ECG)分析方法通常只依赖于单一的数据模式,如原始信号或波形图像,这限制了它们捕捉更广泛的生理背景的能力。为了解决这一限制,我们提出了ggm - mip,这是一种基于图和膜增强的多模态融合框架,用于心肌梗死预测。ggm - mip通过统一的架构集成12导联心电时间序列信号、心电图像和实验室测试结果。具体来说,它使用了一个graphhormer编码器来模拟ECG信号中的导联依赖关系,并使用视觉转换器从ECG图像中提取形态模式,两者都使用特征线性调制(FiLM)通过患者特定的实验室特征进行调制。基于变压器的融合模块捕获跨模态交互,而对比学习目标鼓励信号和图像模态之间的对齐。在真实临床数据集和三个公共基准上的实验结果表明,ggm - mip在多个评估指标上始终优于最先进的基线。消融研究进一步验证了每种形态和建筑组件的贡献。提出的框架为稳健的多模态心血管诊断提供了一种具有临床意义和可扩展的解决方案。
{"title":"Graph-Informed and FiLM-Enhanced Multimodal Fusion for Myocardial Infarction Prediction.","authors":"Xiantong Xiang, Longxiao Gao, Yuansheng Liu, Ningji Gong, Yongshun Gong","doi":"10.1109/JBHI.2026.3669222","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669222","url":null,"abstract":"<p><p>Accurate and timely diagnosis of cardiovascular diseases, particularly myocardial infarction (MI), remains a critical clinical challenge. Existing electrocardiogram (ECG) analysis methods often rely solely on a single data modality, such as raw signals or waveform images, which limits their ability to capture the broader physiological context. To address this limitation, we propose GFM-MIP, a Graph-informed and FiLM-enhanced Multimodal Fusion framework for myocardial infarction prediction. GFM-MIP integrates 12-lead ECG time-series signals, ECG images, and laboratory test results through a unified architecture. Specifically, it employs a Graphormer encoder to model inter-lead dependencies in ECG signals and a Vision Transformer to extract morphological patterns from ECG images, both modulated by patient-specific laboratory features using Feature-wise Linear Modulation (FiLM). A Transformer-based fusion module captures cross-modal interactions, while a contrastive learning objective encourages alignment between signal and image modalities. Experimental results on a real-world clinical dataset and three public benchmarks demonstrate that GFM-MIP consistently outperforms state-of-the-art baselines across multiple evaluation metrics. Ablation studies further validate the contribution of each modality and architectural component. The proposed framework offers a clinically meaningful and scalable solution for robust, multimodal cardiovascular diagnosis.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training. 多层次非对称对比学习用于医学图像分割预训练。
IF 6.8 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-03 DOI: 10.1109/JBHI.2026.3669549
Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu

Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.

医学图像分割是一项基本但具有挑战性的任务,因为从专家那里获取大量高质量的标记数据是一个艰巨的过程。对比学习为这一困境提供了一个有希望但仍有问题的解决方案。首先,现有的医学对比学习策略侧重于提取图像级表征,忽略了丰富的多层次表征。此外,它们通过随机初始化或将预训练与编码器分开来充分利用解码器,从而忽略了编码器和解码器之间的潜在协作。为了解决这些问题,我们提出了一种新的多层次非对称对比学习框架MACL来增强医学图像分割。具体来说,我们设计了一个非对称对比学习结构来同时预训练编码器和解码器,为分割模型提供更好的初始化。此外,我们开发了一种多层次的对比学习策略,该策略集成了特征级、图像级和像素级表示之间的对应关系,以确保编码器和解码器在预训练阶段从不同尺度和粒度的表示中捕获全面的细节。最后,在8个医学图像数据集上的实验表明,我们的MACL框架优于现有的11种对比学习策略。也就是说,我们的MACL在可视化数据上的预测精度更高,比之前在ACDC、MMWHS、HVSMR和CHAOS上的最佳结果分别提高了1.72%、7.87%、2.49%和1.48%。我们的MACL在5个不同的U-Net骨干网中也有很强的泛化能力。
{"title":"Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training.","authors":"Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Zhaoheng Xie, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, May D Wang, Yanye Lu","doi":"10.1109/JBHI.2026.3669549","DOIUrl":"https://doi.org/10.1109/JBHI.2026.3669549","url":null,"abstract":"<p><p>Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Firstly existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. Furthermore they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for enhancing medical image segmentation. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. i.e. Our MACL achieves a superior performance with more precise predictions from visualization figures and 1.72%, 7.87%, 2.49% and 1.48% Dice higher than previous best results on ACDC, MMWHS, HVSMR and CHAOS with 10% labeled data, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147348315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Biomedical and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1