Hajer Akid, Kirsley Chennen, Gabriel Frey, Julie Thompson, Mounir Ben Ayed, Nicolas Lachiche
{"title":"Graph-based machine learning model for weight prediction in protein-protein networks.","authors":"Hajer Akid, Kirsley Chennen, Gabriel Frey, Julie Thompson, Mounir Ben Ayed, Nicolas Lachiche","doi":"10.1186/s12859-024-05973-6","DOIUrl":null,"url":null,"abstract":"<p><p>Proteins interact with each other in complex ways to perform significant biological functions. These interactions, known as protein-protein interactions (PPIs), can be depicted as a graph where proteins are nodes and their interactions are edges. The development of high-throughput experimental technologies allows for the generation of numerous data which permits increasing the sophistication of PPI models. However, despite significant progress, current PPI networks remain incomplete. Discovering missing interactions through experimental techniques can be costly, time-consuming, and challenging. Therefore, computational approaches have emerged as valuable tools for predicting missing interactions. In PPI networks, a graph is usually used to model the interactions between proteins. An edge between two proteins indicates a known interaction, while the absence of an edge means the interaction is not known or missed. However, this binary representation overlooks the reliability of known interactions when predicting new ones. To address this challenge, we propose a novel approach for link prediction in weighted protein-protein networks, where interaction weights denote confidence scores. By leveraging data from the yeast Saccharomyces cerevisiae obtained from the STRING database, we introduce a new model that combines similarity-based algorithms and aggregated confidence score weights for accurate link prediction purposes. Our model significantly improves prediction accuracy, surpassing traditional approaches in terms of Mean Absolute Error, Mean Relative Absolute Error, and Root Mean Square Error. Our proposed approach holds the potential for improved accuracy in predicting PPIs, which is crucial for better understanding the underlying biological processes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11546293/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05973-6","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Proteins interact with each other in complex ways to perform significant biological functions. These interactions, known as protein-protein interactions (PPIs), can be depicted as a graph where proteins are nodes and their interactions are edges. The development of high-throughput experimental technologies allows for the generation of numerous data which permits increasing the sophistication of PPI models. However, despite significant progress, current PPI networks remain incomplete. Discovering missing interactions through experimental techniques can be costly, time-consuming, and challenging. Therefore, computational approaches have emerged as valuable tools for predicting missing interactions. In PPI networks, a graph is usually used to model the interactions between proteins. An edge between two proteins indicates a known interaction, while the absence of an edge means the interaction is not known or missed. However, this binary representation overlooks the reliability of known interactions when predicting new ones. To address this challenge, we propose a novel approach for link prediction in weighted protein-protein networks, where interaction weights denote confidence scores. By leveraging data from the yeast Saccharomyces cerevisiae obtained from the STRING database, we introduce a new model that combines similarity-based algorithms and aggregated confidence score weights for accurate link prediction purposes. Our model significantly improves prediction accuracy, surpassing traditional approaches in terms of Mean Absolute Error, Mean Relative Absolute Error, and Root Mean Square Error. Our proposed approach holds the potential for improved accuracy in predicting PPIs, which is crucial for better understanding the underlying biological processes.
蛋白质以复杂的方式相互作用,发挥重要的生物功能。这些相互作用被称为蛋白质-蛋白质相互作用(PPIs),可以描绘成一张图,其中蛋白质是节点,它们之间的相互作用是边。高通量实验技术的发展允许生成大量数据,从而提高了 PPI 模型的复杂性。然而,尽管取得了重大进展,目前的 PPI 网络仍然不完整。通过实验技术发现缺失的相互作用可能成本高、耗时长,而且具有挑战性。因此,计算方法已成为预测缺失相互作用的重要工具。在 PPI 网络中,通常使用图来模拟蛋白质之间的相互作用。两个蛋白质之间的边表示已知的相互作用,而没有边则表示不知道或错过了相互作用。然而,这种二元表示法在预测新的相互作用时忽略了已知相互作用的可靠性。为了应对这一挑战,我们提出了一种在加权蛋白质-蛋白质网络中进行链接预测的新方法,其中相互作用权重表示置信度分数。通过利用从 STRING 数据库中获得的酿酒酵母数据,我们引入了一个新模型,该模型结合了基于相似性的算法和聚合置信度分数权重,以达到精确链接预测的目的。我们的模型大大提高了预测准确性,在平均绝对误差、平均相对绝对误差和均方根误差方面都超过了传统方法。我们提出的方法有望提高预测 PPIs 的准确性,这对于更好地理解潜在的生物过程至关重要。
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.