基于五元表征的双元异构网络学习

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-03-26 DOI:10.1145/3653978

Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu

{"title":"基于五元表征的双元异构网络学习","authors":"Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu","doi":"10.1145/3653978","DOIUrl":null,"url":null,"abstract":"Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks. In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions. Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required. In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones. Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning. These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones. However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks. To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks. Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures. In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods. Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"44 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quintuple-based Representation Learning for Bipartite Heterogeneous Networks\",\"authors\":\"Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu\",\"doi\":\"10.1145/3653978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks. In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions. Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required. In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones. Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning. These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones. However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks. To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks. Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures. In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods. Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.\",\"PeriodicalId\":48967,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3653978\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653978","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，网络表示学习取得了突飞猛进的发展，不再需要繁琐的特征工程，为基于网络的下游任务提供了便利。在现实中，网络往往表现出异质性，这意味着可能存在多种类型的节点和交互。异构网络给表示学习带来了新的挑战，因为需要了解节点和边缘类型。本文研究了一般异构网络的一个基本构件，即具有两种节点类型的异构网络。通过将一般异构网络分解为多个两端网络，可以解决很多问题。最近，为了克服嵌入空间中使用的非度量方法的缺点，人们利用基于度量学习的方法来解决异构网络表示学习问题。这些方法首先生成锚节点、正节点和负节点共存的三胞胎样本，然后尝试拉近正样本，推远负样本。然而，在处理异构网络时，即使是最简单的双类型网络，三元组也不能同时涉及来自网络不同部分的正样本和负样本。为了解决基于三元组的度量学习不兼容的问题，我们在本文中提出了一种基于五元组的新方法，用于学习双元异构网络中的节点表示。具体来说，我们从网络的两个不同部分生成包含正样本和负样本的五元组。我们还制定了两个学习目标，以适应基于五元组的学习样本，一个是基于邻近性的损失，它通过西格玛概率对五元组中的关系进行建模；另一个是角度损失，它能更稳健地保持相似性结构。此外，我们还通过使用节点邻域周围的一维卷积算子对特征学习进行参数化。与八种方法相比，我们在两个下游任务上的大量实验证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Quintuple-based Representation Learning for Bipartite Heterogeneous Networks

Recent years have seen rapid progress in network representation learning, which removes the need for burdensome feature engineering and facilitates downstream network-based tasks.

In reality, networks often exhibit heterogeneity, which means there may exist multiple types of nodes and interactions.

Heterogeneous networks raise new challenges to representation learning, as the awareness of node and edge types is required.

In this paper, we study a basic building block of general heterogeneous networks, the heterogeneous networks with two types of nodes. Many problems can be solved by decomposing general heterogeneous networks into multiple bipartite ones.

Recently, to overcome the demerits of non-metric measures used in the embedding space, metric learning-based approaches have been leveraged to tackle heterogeneous network representation learning.

These approaches first generate triplets of samples, in which an anchor node, a positive counterpart and a negative one co-exist, and then try to pull closer positive samples and push away negative ones.

However, when dealing with heterogeneous networks, even the simplest two-typed ones, triplets cannot simultaneously involve both positive and negative samples from different parts of networks.

To address this incompatibility of triplet-based metric learning, in this paper, we propose a novel quintuple-based method for learning node representations in bipartite heterogeneous networks.

Specifically, we generate quintuples that contain positive and negative samples from two different parts of networks. And we formulate two learning objectives that accommodate quintuple-based learning samples, a proximity-based loss that models the relations in quintuples by sigmoid probabilities, and an angular loss that more robustly maintains similarity structures.

In addition, we also parameterize feature learning by using one-dimensional convolution operators around nodes’ neighborhoods.

Compared with eight methods, extensive experiments on two downstream tasks manifest the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.