DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer methods and programs in biomedicine Pub Date : 2024-10-30 DOI:10.1016/j.cmpb.2024.108478

Liangrui Pan , Xiang Wang , Qingchun Liang , Jiandong Shang , Wenjuan Liu , Liwen Xu , Shaoliang Peng

{"title":"DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data","authors":"Liangrui Pan , Xiang Wang , Qingchun Liang , Jiandong Shang , Wenjuan Liu , Liwen Xu , Shaoliang Peng","doi":"10.1016/j.cmpb.2024.108478","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes.</div></div><div><h3>Methods:</h3><div>We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation.</div></div><div><h3>Results:</h3><div>The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML.</div></div><div><h3>Conclusion:</h3><div>In this paper, we proposed DEDUCE model learns features from multi-omics data through SMAE, and the subtype decoupled contrastive learning consistently optimizes the model for clustering and identifying cancer subtypes. The DEDUCE model demonstrates a significant capability in discovering new cancer subtypes. We applied the DEDUCE model to identify six subtypes of AML. Through the analysis of GO function enrichment, subtype-specific biological functions, and GSEA of AML using the DEDUCE model, the interpretability of the DEDUCE model in identifying cancer subtypes is further enhanced.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"257 ","pages":"Article 108478"},"PeriodicalIF":4.9000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724004711","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective:

Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes.

Methods:

We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation.

Results:

The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML.

Conclusion:

In this paper, we proposed DEDUCE model learns features from multi-omics data through SMAE, and the subtype decoupled contrastive learning consistently optimizes the model for clustering and identifying cancer subtypes. The DEDUCE model demonstrates a significant capability in discovering new cancer subtypes. We applied the DEDUCE model to identify six subtypes of AML. Through the analysis of GO function enrichment, subtype-specific biological functions, and GSEA of AML using the DEDUCE model, the interpretability of the DEDUCE model in identifying cancer subtypes is further enhanced.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DEDUCE：基于多组学数据的多头注意力解耦对比学习发现癌症亚型

背景和目的：鉴于癌症的高度异质性和临床多样性，不同癌症亚型的多组学数据和临床特征存在很大差异。方法：我们提出了一种基于对称多头注意力编码器（SMAE）的无监督对比学习模型，名为DEDUCE，用于分析癌症多组学数据，旨在识别和描述癌症亚型。该模型采用无监督 SMAE，能从多组学数据中深入提取上下文特征和长程依赖关系，从而减轻噪声的影响。重要的是，DEDUCE 引入了一种基于多头注意力机制的亚型解耦对比学习方法，可同时从多组学数据中学习特征并进行聚类，以识别癌症亚型。通过计算多组学数据特征空间和样本空间中样本之间的相似性，对亚型进行聚类。基本概念包括解耦多组学数据特征的各种属性，并将它们作为对比项进行学习。结果：DEDUCE 模型在模拟多组学数据集、单细胞多组学数据集和癌症多组学数据集上进行了大量实验，其表现优于 10 个深度学习模型。DEDUCE 模型优于最先进的方法，消融实验证明了 DEDUCE 模型中每个模块的有效性。最后，我们应用DEDUCE模型识别了急性髓细胞白血病的六种癌症亚型。结论：本文提出的DEDUCE模型通过SMAE从多组学数据中学习特征，亚型解耦对比学习持续优化了模型的聚类和癌症亚型识别。DEDUCE 模型在发现新的癌症亚型方面表现出了显著的能力。我们应用 DEDUCE 模型识别了六种急性髓细胞白血病亚型。通过使用 DEDUCE 模型对 AML 的 GO 功能富集、亚型特异性生物功能和 GSEA 进行分析，进一步提高了 DEDUCE 模型在确定癌症亚型方面的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.