Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen
The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.
{"title":"Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness","authors":"Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen","doi":"10.1145/3648684","DOIUrl":"https://doi.org/10.1145/3648684","url":null,"abstract":"<p>The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the <i>i.i.d</i> (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang
With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.
{"title":"Dual-side Adversarial Learning based Fair Recommendation for Sensitive Attribute Filtering","authors":"Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang","doi":"10.1145/3648683","DOIUrl":"https://doi.org/10.1145/3648683","url":null,"abstract":"<p>With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.
{"title":"Intricate Spatiotemporal Dependency Learning for Temporal Knowledge Graph Reasoning","authors":"Xuefei Li, Huiwei Zhou, Weihong Yao, Wenchu Li, Baojie Liu, Yingyu Lin","doi":"10.1145/3648366","DOIUrl":"https://doi.org/10.1145/3648366","url":null,"abstract":"<p>Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos
Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.
{"title":"Do we really need imputation in AutoML predictive modeling?","authors":"George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos","doi":"10.1145/3643643","DOIUrl":"https://doi.org/10.1145/3643643","url":null,"abstract":"<p>Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph.
This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.
{"title":"On Breaking Truss-Based and Core-Based Communities","authors":"Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering","doi":"10.1145/3644077","DOIUrl":"https://doi.org/10.1145/3644077","url":null,"abstract":"<p>We introduce the general problem of identifying a smallest <i>edge</i> subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: <i>k</i>-truss and <i>k</i>-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; <i>k</i>-truss or <i>k</i>-core, in our case. These problems are directly applicable in social networks: the identified edges can be <i>hidden</i> by users or <i>sanitized</i> from the output graph; or in communication networks: the identified edges correspond to <i>vital</i> network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the <i>k</i>-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph. </p><p>This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graphs can facilitate modeling various complex systems such as gene networks and power grids, as well as analyzing the underlying relations within them. Learning over graphs has recently attracted increasing attention, particularly graph neural network-based (GNN) solutions, among which graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. Although it is shown that the use of graph structures in learning results in the amplification of algorithmic bias, the influence of the attention design in GATs on algorithmic bias has not been investigated. Motivated by this, the present study first carries out a theoretical analysis in order to demonstrate the sources of algorithmic bias in GAT-based learning for node classification. Then, a novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed based on the theoretical findings. Experimental results on real-world networks demonstrate that FairGAT improves group fairness measures while also providing comparable utility to the fairness-aware baselines for node classification and link prediction.
图可以帮助对基因网络和电网等各种复杂系统进行建模,并分析其中的内在关系。最近,对图的学习引起了越来越多的关注,特别是基于图神经网络(GNN)的解决方案,其中图注意力网络(GAT)已成为基于图的任务中使用最广泛的神经网络结构之一。虽然有研究表明,在学习中使用图结构会导致算法偏差的放大,但 GATs 中的注意力设计对算法偏差的影响尚未得到研究。受此启发,本研究首先进行了理论分析,以证明基于 GAT 的节点分类学习中算法偏差的来源。然后,在理论分析的基础上,开发了一种利用公平感知注意力设计的新型算法 FairGAT。在真实世界网络上的实验结果表明,FairGAT 提高了群体公平性度量,同时在节点分类和链接预测方面提供了与公平感知基线相当的效用。
{"title":"FairGAT: Fairness-aware Graph Attention Networks","authors":"O. Deniz Kose, Yanning Shen","doi":"10.1145/3645096","DOIUrl":"https://doi.org/10.1145/3645096","url":null,"abstract":"<p>Graphs can facilitate modeling various complex systems such as gene networks and power grids, as well as analyzing the underlying relations within them. Learning over graphs has recently attracted increasing attention, particularly graph neural network-based (GNN) solutions, among which graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. Although it is shown that the use of graph structures in learning results in the amplification of algorithmic bias, the influence of the attention design in GATs on algorithmic bias has not been investigated. Motivated by this, the present study first carries out a theoretical analysis in order to demonstrate the sources of algorithmic bias in GAT-based learning for node classification. Then, a novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed based on the theoretical findings. Experimental results on real-world networks demonstrate that FairGAT improves group fairness measures while also providing comparable utility to the fairness-aware baselines for node classification and link prediction.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao
There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.
{"title":"Social Behavior Analysis in Exclusive Enterprise Social Networks by FastHAND","authors":"Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao","doi":"10.1145/3646552","DOIUrl":"https://doi.org/10.1145/3646552","url":null,"abstract":"<p>There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space.
Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces.
Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances.
We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.
{"title":"nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces","authors":"Richard Connor, Lucia Vadicamo","doi":"10.1145/3647642","DOIUrl":"https://doi.org/10.1145/3647642","url":null,"abstract":"<p>Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space. </p><p>Many such transforms have been described. They have been classified in two main groups: <i>linear</i> and <i>topological</i>. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces. </p><p>Here, we introduce <i>nSimplex Zen</i>, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances. </p><p>We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.
个人往往参与多个在线社交网络。考虑到这些网络的所有者不愿意分享他们的网络,一些全局算法结合了多个网络的信息,在不分享网络边的情况下检测多个网络中的所有社区。当数据所有者只对包含给定节点的社区感兴趣时,就没有必要让多个网络相互影响以挖掘所有社区,而且计算成本很高。此外,专门寻找社区的数据所有者通常更愿意提供比全局算法要求更少的数据。因此,我们提出了本地协作社区检测问题(LCCD)。它利用来自多个网络的信息来联合检测包含给定节点的本地社区,而无需直接共享网络之间的边。为了解决 LCCD 问题,我们提出了一种从 M 方法发展而来的方法,称为 colM,用于检测多个网络中的本地社区。该方法采用安全的多方计算协议来保护每个网络的私有信息。我们在真实世界和合成数据集上进行了实验。实验结果表明,colM 方法能有效识别社区结构,其性能优于比较算法。
{"title":"Local community detection in multiple private networks","authors":"Li Ni, Rui Ye, Wenjian Luo, Yiwen Zhang","doi":"10.1145/3644078","DOIUrl":"https://doi.org/10.1145/3644078","url":null,"abstract":"<p>Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"129 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.
{"title":"Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning","authors":"Yuehui Han","doi":"10.1145/3645095","DOIUrl":"https://doi.org/10.1145/3645095","url":null,"abstract":"<p>Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"107 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}