Binon Teji;Swarup Roy;Devendra Singh Dhami;Dinabandhu Bhandari;Pietro Hiram Guzzi
{"title":"Graph Embedding Techniques for Predicting Missing Links in Biological Networks: An Empirical Evaluation","authors":"Binon Teji;Swarup Roy;Devendra Singh Dhami;Dinabandhu Bhandari;Pietro Hiram Guzzi","doi":"10.1109/TETC.2023.3282539","DOIUrl":null,"url":null,"abstract":"Network science tries to understand the complex relationships among entities or actors of a system through graph formalism. For instance, biological networks represent macromolecules such as genes, proteins, or other small chemicals as nodes and the interactions among the molecules as links or edges. Often potential links are guessed computationally due to the expensive nature of wet lab experiments. Conventional link prediction techniques rely on local network topology and fail to incorporate the global structure fully. Graph representation learning (or embedding) aims to describe the properties of the entire graph by optimized, structure-preserving encoding of nodes or entire (sub) graphs into lower-dimensional vectors. Leveraging the encoded vectors as a feature improves the performance of the missing link identification task. Assessing the predictive quality of graph embedding techniques in missing link identification is essential. In this work, we evaluate the performance of ten (10) state-of-the-art graph embedding techniques in predicting missing links with special emphasis on homogeneous and heterogeneous biological networks. Most available graph embedding techniques cannot be used directly for link prediction. Hence, we use the latent representation of the network produced by the candidate techniques and reconstruct the network using various similarity and kernel functions. We evaluate nine (09) similarity functions in combination with candidate embedding techniques. We compare embedding techniques’ performance against five (05) traditional (non-embedding-based) link prediction techniques. Experimental results reveal that the quality of embedding-based link prediction is better than its counterpart. Among them, Neural Network-based embedding and attention-based techniques show consistent performance. We even observe that dot-product-based similarity is the best in inferring pair-wise edges among the nodes from their embedding. We report interesting findings that while predicting links in the heterogeneous graph, it predicts a good number of valid links between corresponding homogeneous nodes due to the possible indirect effect of homogeneous-heterogeneous interactions.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"190-201"},"PeriodicalIF":5.1000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10146293/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Network science tries to understand the complex relationships among entities or actors of a system through graph formalism. For instance, biological networks represent macromolecules such as genes, proteins, or other small chemicals as nodes and the interactions among the molecules as links or edges. Often potential links are guessed computationally due to the expensive nature of wet lab experiments. Conventional link prediction techniques rely on local network topology and fail to incorporate the global structure fully. Graph representation learning (or embedding) aims to describe the properties of the entire graph by optimized, structure-preserving encoding of nodes or entire (sub) graphs into lower-dimensional vectors. Leveraging the encoded vectors as a feature improves the performance of the missing link identification task. Assessing the predictive quality of graph embedding techniques in missing link identification is essential. In this work, we evaluate the performance of ten (10) state-of-the-art graph embedding techniques in predicting missing links with special emphasis on homogeneous and heterogeneous biological networks. Most available graph embedding techniques cannot be used directly for link prediction. Hence, we use the latent representation of the network produced by the candidate techniques and reconstruct the network using various similarity and kernel functions. We evaluate nine (09) similarity functions in combination with candidate embedding techniques. We compare embedding techniques’ performance against five (05) traditional (non-embedding-based) link prediction techniques. Experimental results reveal that the quality of embedding-based link prediction is better than its counterpart. Among them, Neural Network-based embedding and attention-based techniques show consistent performance. We even observe that dot-product-based similarity is the best in inferring pair-wise edges among the nodes from their embedding. We report interesting findings that while predicting links in the heterogeneous graph, it predicts a good number of valid links between corresponding homogeneous nodes due to the possible indirect effect of homogeneous-heterogeneous interactions.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.