Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu
{"title":"Quintuple-based Representation Learning for Bipartite Heterogeneous Networks","authors":"Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu","doi":"10.1145/3653978","DOIUrl":null,"url":null,"abstract":"<p>Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks. </p><p>In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions. </p><p>Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required. </p><p>In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones. </p><p>Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning. </p><p>These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones. </p><p>However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks. </p><p>To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks. </p><p>Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures. </p><p>In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods. </p><p>Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653978","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks.
In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions.
Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required.
In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones.
Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning.
These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones.
However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks.
To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks.
Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures.
In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods.
Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.