Origin-destination (OD) flow contains population mobility information between every two regions in the city, which is of great value in urban planning and transportation management. Nevertheless, the collection of OD flow data is extremely difficult due to the hindrance of privacy issues and collection costs. Significant efforts have been made to generate OD flow based on urban regional features, e.g. demographics, land use, etc. since spatial heterogeneity of urban function is the primary cause that drives people to move from one place to another. On the other hand, people travel through various routes between OD, which will have effects on urban traffics, e.g. road travel speed and time. These effects of OD flows reveal the fine-grained spatiotemporal patterns of population mobility. Few works have explored the effectiveness of incorporating urban traffic information into OD generation. To bridge this gap, we propose to generate real-world daily temporal OD flows enhanced by urban traffic information in this paper. Our model consists of two modules: Urban2OD and OD2Traffic. In the Urban2OD module, we devise a spatiotemporal graph neural network to model the complex dependencies between daily temporal OD flows and regional features. In the OD2Traffic module, we introduce an attention-based neural network to predict urban traffics based on OD flow from Urban2OD module. Then, by utilizing gradient backpropagation, these two modules are able to enhance each other to generate high-quality OD flow data. Extensive experiments conducted on real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art.
起点-终点(OD)流包含城市中每两个区域之间的人口流动信息,在城市规划和交通管理中具有重要价值。然而,由于隐私问题和收集成本的阻碍,OD 流量数据的收集极为困难。由于城市功能的空间异质性是促使人们从一个地方迁移到另一个地方的主要原因,因此人们已经做出了巨大努力,根据城市区域特征(如人口统计、土地利用等)生成 OD 流量。另一方面,人们在 OD 之间通过不同路线出行,这将对城市交通产生影响,如道路通行速度和时间。这些 OD 流量的影响揭示了人口流动的精细时空模式。很少有研究探讨将城市交通信息纳入 OD 生成的有效性。为了弥补这一不足,我们在本文中提出利用城市交通信息生成真实世界中的每日时空 OD 流量。我们的模型由两个模块组成:Urban2OD 和 OD2Traffic。在 Urban2OD 模块中,我们设计了一个时空图神经网络来模拟每日时间性 OD 流量与区域特征之间的复杂依赖关系。在 OD2Traffic 模块中,我们引入了基于注意力的神经网络,根据 Urban2OD 模块的 OD 流量预测城市交通。然后,通过梯度反向传播,这两个模块能够相互促进,生成高质量的 OD 流量数据。在真实世界数据集上进行的大量实验证明,我们提出的模型优于最先进的模型。
{"title":"Learning to Generate Temporal Origin-destination Flow Based on Urban Regional Features and Traffic Information","authors":"Can Rong, Zhicheng Liu, Jingtao Ding, Yong Li","doi":"10.1145/3649141","DOIUrl":"https://doi.org/10.1145/3649141","url":null,"abstract":"<p>Origin-destination (OD) flow contains population mobility information between every two regions in the city, which is of great value in urban planning and transportation management. Nevertheless, the collection of OD flow data is extremely difficult due to the hindrance of privacy issues and collection costs. Significant efforts have been made to generate OD flow based on urban regional features, e.g. demographics, land use, etc. since spatial heterogeneity of urban function is the primary cause that drives people to move from one place to another. On the other hand, people travel through various routes between OD, which will have effects on urban traffics, e.g. road travel speed and time. These effects of OD flows reveal the fine-grained spatiotemporal patterns of population mobility. Few works have explored the effectiveness of incorporating urban traffic information into OD generation. To bridge this gap, we propose to generate real-world daily temporal OD flows enhanced by urban traffic information in this paper. Our model consists of two modules: <i>Urban2OD</i> and <i>OD2Traffic</i>. In the <i>Urban2OD</i> module, we devise a spatiotemporal graph neural network to model the complex dependencies between daily temporal OD flows and regional features. In the <i>OD2Traffic</i> module, we introduce an attention-based neural network to predict urban traffics based on OD flow from <i>Urban2OD</i> module. Then, by utilizing gradient backpropagation, these two modules are able to enhance each other to generate high-quality OD flow data. Extensive experiments conducted on real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang
With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.
{"title":"Dual-side Adversarial Learning based Fair Recommendation for Sensitive Attribute Filtering","authors":"Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang","doi":"10.1145/3648683","DOIUrl":"https://doi.org/10.1145/3648683","url":null,"abstract":"<p>With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.
{"title":"Intricate Spatiotemporal Dependency Learning for Temporal Knowledge Graph Reasoning","authors":"Xuefei Li, Huiwei Zhou, Weihong Yao, Wenchu Li, Baojie Liu, Yingyu Lin","doi":"10.1145/3648366","DOIUrl":"https://doi.org/10.1145/3648366","url":null,"abstract":"<p>Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos
Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.
{"title":"Do we really need imputation in AutoML predictive modeling?","authors":"George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos","doi":"10.1145/3643643","DOIUrl":"https://doi.org/10.1145/3643643","url":null,"abstract":"<p>Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph.
This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.
{"title":"On Breaking Truss-Based and Core-Based Communities","authors":"Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering","doi":"10.1145/3644077","DOIUrl":"https://doi.org/10.1145/3644077","url":null,"abstract":"<p>We introduce the general problem of identifying a smallest <i>edge</i> subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: <i>k</i>-truss and <i>k</i>-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; <i>k</i>-truss or <i>k</i>-core, in our case. These problems are directly applicable in social networks: the identified edges can be <i>hidden</i> by users or <i>sanitized</i> from the output graph; or in communication networks: the identified edges correspond to <i>vital</i> network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the <i>k</i>-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph. </p><p>This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graphs can facilitate modeling various complex systems such as gene networks and power grids, as well as analyzing the underlying relations within them. Learning over graphs has recently attracted increasing attention, particularly graph neural network-based (GNN) solutions, among which graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. Although it is shown that the use of graph structures in learning results in the amplification of algorithmic bias, the influence of the attention design in GATs on algorithmic bias has not been investigated. Motivated by this, the present study first carries out a theoretical analysis in order to demonstrate the sources of algorithmic bias in GAT-based learning for node classification. Then, a novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed based on the theoretical findings. Experimental results on real-world networks demonstrate that FairGAT improves group fairness measures while also providing comparable utility to the fairness-aware baselines for node classification and link prediction.
图可以帮助对基因网络和电网等各种复杂系统进行建模,并分析其中的内在关系。最近,对图的学习引起了越来越多的关注,特别是基于图神经网络(GNN)的解决方案,其中图注意力网络(GAT)已成为基于图的任务中使用最广泛的神经网络结构之一。虽然有研究表明,在学习中使用图结构会导致算法偏差的放大,但 GATs 中的注意力设计对算法偏差的影响尚未得到研究。受此启发,本研究首先进行了理论分析,以证明基于 GAT 的节点分类学习中算法偏差的来源。然后,在理论分析的基础上,开发了一种利用公平感知注意力设计的新型算法 FairGAT。在真实世界网络上的实验结果表明,FairGAT 提高了群体公平性度量,同时在节点分类和链接预测方面提供了与公平感知基线相当的效用。
{"title":"FairGAT: Fairness-aware Graph Attention Networks","authors":"O. Deniz Kose, Yanning Shen","doi":"10.1145/3645096","DOIUrl":"https://doi.org/10.1145/3645096","url":null,"abstract":"<p>Graphs can facilitate modeling various complex systems such as gene networks and power grids, as well as analyzing the underlying relations within them. Learning over graphs has recently attracted increasing attention, particularly graph neural network-based (GNN) solutions, among which graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. Although it is shown that the use of graph structures in learning results in the amplification of algorithmic bias, the influence of the attention design in GATs on algorithmic bias has not been investigated. Motivated by this, the present study first carries out a theoretical analysis in order to demonstrate the sources of algorithmic bias in GAT-based learning for node classification. Then, a novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed based on the theoretical findings. Experimental results on real-world networks demonstrate that FairGAT improves group fairness measures while also providing comparable utility to the fairness-aware baselines for node classification and link prediction.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao
There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.
{"title":"Social Behavior Analysis in Exclusive Enterprise Social Networks by FastHAND","authors":"Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao","doi":"10.1145/3646552","DOIUrl":"https://doi.org/10.1145/3646552","url":null,"abstract":"<p>There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space.
Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces.
Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances.
We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.
{"title":"nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces","authors":"Richard Connor, Lucia Vadicamo","doi":"10.1145/3647642","DOIUrl":"https://doi.org/10.1145/3647642","url":null,"abstract":"<p>Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space. </p><p>Many such transforms have been described. They have been classified in two main groups: <i>linear</i> and <i>topological</i>. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces. </p><p>Here, we introduce <i>nSimplex Zen</i>, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances. </p><p>We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.
个人往往参与多个在线社交网络。考虑到这些网络的所有者不愿意分享他们的网络,一些全局算法结合了多个网络的信息,在不分享网络边的情况下检测多个网络中的所有社区。当数据所有者只对包含给定节点的社区感兴趣时,就没有必要让多个网络相互影响以挖掘所有社区,而且计算成本很高。此外,专门寻找社区的数据所有者通常更愿意提供比全局算法要求更少的数据。因此,我们提出了本地协作社区检测问题(LCCD)。它利用来自多个网络的信息来联合检测包含给定节点的本地社区,而无需直接共享网络之间的边。为了解决 LCCD 问题,我们提出了一种从 M 方法发展而来的方法,称为 colM,用于检测多个网络中的本地社区。该方法采用安全的多方计算协议来保护每个网络的私有信息。我们在真实世界和合成数据集上进行了实验。实验结果表明,colM 方法能有效识别社区结构,其性能优于比较算法。
{"title":"Local community detection in multiple private networks","authors":"Li Ni, Rui Ye, Wenjian Luo, Yiwen Zhang","doi":"10.1145/3644078","DOIUrl":"https://doi.org/10.1145/3644078","url":null,"abstract":"<p>Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"129 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.
{"title":"Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning","authors":"Yuehui Han","doi":"10.1145/3645095","DOIUrl":"https://doi.org/10.1145/3645095","url":null,"abstract":"<p>Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"107 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}