Graph Embedding Techniques for Predicting Missing Links in Biological Networks: An Empirical Evaluation

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Emerging Topics in Computing Pub Date : 2023-06-08 DOI:10.1109/TETC.2023.3282539

Binon Teji;Swarup Roy;Devendra Singh Dhami;Dinabandhu Bhandari;Pietro Hiram Guzzi

{"title":"Graph Embedding Techniques for Predicting Missing Links in Biological Networks: An Empirical Evaluation","authors":"Binon Teji;Swarup Roy;Devendra Singh Dhami;Dinabandhu Bhandari;Pietro Hiram Guzzi","doi":"10.1109/TETC.2023.3282539","DOIUrl":null,"url":null,"abstract":"Network science tries to understand the complex relationships among entities or actors of a system through graph formalism. For instance, biological networks represent macromolecules such as genes, proteins, or other small chemicals as nodes and the interactions among the molecules as links or edges. Often potential links are guessed computationally due to the expensive nature of wet lab experiments. Conventional link prediction techniques rely on local network topology and fail to incorporate the global structure fully. Graph representation learning (or embedding) aims to describe the properties of the entire graph by optimized, structure-preserving encoding of nodes or entire (sub) graphs into lower-dimensional vectors. Leveraging the encoded vectors as a feature improves the performance of the missing link identification task. Assessing the predictive quality of graph embedding techniques in missing link identification is essential. In this work, we evaluate the performance of ten (10) state-of-the-art graph embedding techniques in predicting missing links with special emphasis on homogeneous and heterogeneous biological networks. Most available graph embedding techniques cannot be used directly for link prediction. Hence, we use the latent representation of the network produced by the candidate techniques and reconstruct the network using various similarity and kernel functions. We evaluate nine (09) similarity functions in combination with candidate embedding techniques. We compare embedding techniques’ performance against five (05) traditional (non-embedding-based) link prediction techniques. Experimental results reveal that the quality of embedding-based link prediction is better than its counterpart. Among them, Neural Network-based embedding and attention-based techniques show consistent performance. We even observe that dot-product-based similarity is the best in inferring pair-wise edges among the nodes from their embedding. We report interesting findings that while predicting links in the heterogeneous graph, it predicts a good number of valid links between corresponding homogeneous nodes due to the possible indirect effect of homogeneous-heterogeneous interactions.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"190-201"},"PeriodicalIF":5.4000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10146293/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Network science tries to understand the complex relationships among entities or actors of a system through graph formalism. For instance, biological networks represent macromolecules such as genes, proteins, or other small chemicals as nodes and the interactions among the molecules as links or edges. Often potential links are guessed computationally due to the expensive nature of wet lab experiments. Conventional link prediction techniques rely on local network topology and fail to incorporate the global structure fully. Graph representation learning (or embedding) aims to describe the properties of the entire graph by optimized, structure-preserving encoding of nodes or entire (sub) graphs into lower-dimensional vectors. Leveraging the encoded vectors as a feature improves the performance of the missing link identification task. Assessing the predictive quality of graph embedding techniques in missing link identification is essential. In this work, we evaluate the performance of ten (10) state-of-the-art graph embedding techniques in predicting missing links with special emphasis on homogeneous and heterogeneous biological networks. Most available graph embedding techniques cannot be used directly for link prediction. Hence, we use the latent representation of the network produced by the candidate techniques and reconstruct the network using various similarity and kernel functions. We evaluate nine (09) similarity functions in combination with candidate embedding techniques. We compare embedding techniques’ performance against five (05) traditional (non-embedding-based) link prediction techniques. Experimental results reveal that the quality of embedding-based link prediction is better than its counterpart. Among them, Neural Network-based embedding and attention-based techniques show consistent performance. We even observe that dot-product-based similarity is the best in inferring pair-wise edges among the nodes from their embedding. We report interesting findings that while predicting links in the heterogeneous graph, it predicts a good number of valid links between corresponding homogeneous nodes due to the possible indirect effect of homogeneous-heterogeneous interactions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

预测生物网络中缺失链接的图嵌入技术：经验评估

网络科学试图通过图形式理解系统中实体或参与者之间的复杂关系。例如，生物网络将基因、蛋白质或其他小型化学物质等大分子表示为节点，将分子间的相互作用表示为链接或边。由于湿实验室实验成本高昂，通常需要通过计算来猜测潜在的链接。传统的链接预测技术依赖于局部网络拓扑结构，而不能充分考虑全局结构。图表示学习（或嵌入）旨在通过对节点或整个（子）图进行优化、结构保留编码，将其转化为低维向量，从而描述整个图的属性。利用编码向量作为特征可以提高缺失链接识别任务的性能。评估图嵌入技术在缺失链接识别中的预测质量至关重要。在这项工作中，我们评估了十（10）种最先进的图嵌入技术在预测缺失链接方面的性能，并特别强调了同质和异质生物网络。大多数现有的图嵌入技术都不能直接用于链接预测。因此，我们使用候选技术生成的网络潜在表示，并使用各种相似性和核函数重建网络。我们评估了与候选嵌入技术相结合的九种（09）相似性函数。我们将嵌入技术的性能与五种传统（非嵌入式）链接预测技术进行了比较。实验结果表明，基于嵌入技术的链接预测质量优于同类技术。其中，基于神经网络的嵌入技术和基于注意力的技术表现一致。我们甚至发现，基于点积的相似性在从节点的嵌入推断节点间的成对边缘方面是最好的。我们报告了一些有趣的发现，即在预测异质图中链接的同时，由于同质-异质交互作用可能产生的间接影响，它还能预测出相应同质节点之间的大量有效链接。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.

期刊最新文献

Front Cover Table of Contents IEEE Transactions on Emerging Topics in Computing Publication Information Multi-View Partial Multi-Label Learning via Class Activation Specific Features Collaborative Learning HIFLA: Hilbert-Inspired Federated Learning via Action Principles