ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins

IF 3.1 4区生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2024-06-06 DOI:10.1016/j.compbiolchem.2024.108115

Pengli Lu, Jialong Tian

{"title":"ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins","authors":"Pengli Lu, Jialong Tian","doi":"10.1016/j.compbiolchem.2024.108115","DOIUrl":null,"url":null,"abstract":"<div><p>Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model’s superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model’s performance.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"112 ","pages":"Article 108115"},"PeriodicalIF":3.1000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124001038","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model’s superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model’s performance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ACDMBI：基于群落划分和多源生物信息融合的深度学习模型预测必需蛋白质

准确识别必需蛋白质对药物研究和疾病诊断至关重要。传统的中心性方法和机器学习方法主要依赖于从蛋白质-蛋白质相互作用（PPI）网络中获得的信息，在准确识别必需蛋白质方面常常面临挑战。尽管一些研究人员尝试整合生物数据和 PPI 网络来预测必需蛋白，但设计有效的整合方法仍然是一个挑战。为了应对这些挑战，本文提出了 ACDMBI 模型，专门用于克服上述问题。ACDMBI 由两个关键模块组成：特征提取和分类。在捕捉相关信息方面，我们从三个不同的数据源中汲取灵感。首先，通过群落划分从 PPI 网络中提取蛋白质的结构特征。随后，使用图卷积网络（GCN）和图注意网络（GAT）进一步优化这些特征。接着，利用双向长短期记忆网络（BiLSTM）和多头自注意机制从基因表达数据中提取蛋白质特征。最后，通过将亚细胞定位数据映射到一维向量并通过全连接层进行处理，得出蛋白质特征。在分类阶段，我们整合了从三种不同数据源提取的特征，精心设计了一个用于蛋白质分类预测的多层深度神经网络（DNN）。酿酒酵母数据的实验结果表明，ACDMBI 模型性能优越，AUC 达到 0.9533，AUPR 达到 0.9153。消融实验进一步表明，有效整合来自不同生物信息的特征大大提高了模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.