Jie Yang, Yapeng Li, Guoyin Wang, Zhong Chen, Di Wu
{"title":"An End-to-end Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction.","authors":"Jie Yang, Yapeng Li, Guoyin Wang, Zhong Chen, Di Wu","doi":"10.1109/TCBB.2024.3486216","DOIUrl":null,"url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3486216","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.
蛋白质-蛋白质相互作用(PPIs)对于理解细胞机制、信号网络、疾病过程和药物开发至关重要,因为它们代表了蛋白质之间的物理接触和功能关联。近年来,旨在预测 PPIs 的人工智能(AI)方法取得了长足的进步。然而,这些方法往往以零散或肤浅的方式处理蛋白质、药物、疾病、核糖核酸(RNA)和蛋白质结构之间错综复杂的关系和机制。这通常是由于非端到端学习框架的局限性造成的,它可能导致次优特征提取和融合,从而影响预测的准确性。为了解决这些不足,本文介绍了一种新型端到端学习模型--知识图谱融合图神经网络(KGF-GNN)。该模型由三个组成部分组成:(1) 蛋白质关联网络(PAN)构建:我们首先构建一个 PAN,广泛捕捉将蛋白质与药物、疾病、RNA 和蛋白质结构联系起来的各种关系和机制。(2) 用于特征提取的图神经网络:然后使用图神经网络(GNN)从 PAN 中提取拓扑和语义特征,同时使用另一个图神经网络直接从观察到的 PPI 网络中提取拓扑特征。(3) 用于特征融合的多层感知器:最后,多层感知器通过端到端学习整合这些不同的特征,确保特征提取和融合过程既全面又优化了 PPI 预测。在真实世界的 PPI 数据集上进行的大量实验验证了我们提出的 KGF-GNN 方法的有效性,它不仅在预测 PPI 方面实现了高准确率,而且大大超过了现有的先进模型。这项工作不仅提高了我们预测 PPIs 的精度,而且有助于人工智能在生物信息学中的广泛应用,对生物研究和治疗开发具有深远影响。
期刊介绍:
IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system