ACM Transactions on Knowledge Discovery from Data最新文献_第6页

Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness 具有公平意识的动态环境响应式在线元学习

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-20 DOI: 10.1145/3648684

Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen

The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.

公平意识在线学习框架已成为终身持续学习背景下的一种有效工具。在这种情况下，学习者的目标是随着时间的推移逐步获得新任务，同时还要保证各种受保护的子人群（如种族和性别）在完成新任务时的统计均等性。当前方法的一个重要局限在于，它们严重依赖于数据的 i.i.d（独立且同分布）假设，从而导致对框架进行静态遗憾分析。然而，需要注意的是，在任务采样来自不同分布的动态环境中，实现低静态遗憾并不一定能转化为强大的性能。在本文中，为了应对不断变化的环境中公平感知在线学习的挑战，我们通过将长期公平性约束纳入强适应性损失后悔框架，引入了一种独特的后悔度量--FairSAR。此外，为了确定每个时间步的最优模型参数，我们引入了一种创新的自适应公平感知在线元学习算法，简称为 FairSAOML。该算法通过有效管理偏差控制和模型精度，具有适应动态环境的能力。该问题的框架为双级凸凹优化，同时考虑模型的主参数和双参数，这两个参数分别与模型的准确性和公平性属性有关。理论分析得出了损失遗憾和违反公平性约束累积的亚线性上限。我们在动态环境中的各种真实数据集上进行的实验评估表明，我们提出的 FairSAOML 算法始终优于植根于最先进的先验在线学习方法的其他方法。

{"title":"Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness","authors":"Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen","doi":"10.1145/3648684","DOIUrl":"https://doi.org/10.1145/3648684","url":null,"abstract":"The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual-side Adversarial Learning based Fair Recommendation for Sensitive Attribute Filtering 基于双侧对抗学习的敏感属性过滤公平推荐

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-19 DOI: 10.1145/3648683

Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang

With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.

随着推荐算法的发展，研究人员越来越关注公平性问题，如推荐中的用户歧视。为了解决这些问题，现有的研究通常会在学习用户表征的过程中过滤可能造成歧视的用户敏感信息。然而，这些方法忽略了项目内容属性与用户敏感信息之间的潜在关系。本文提出了一种基于用户侧和物品侧对抗学习的公平感知推荐算法（DALFRec），以减轻敏感信息在推荐过程中对双方的影响。首先，我们通过统计分析证明了项目信息与用户敏感属性之间的潜在关系。然后，我们设计了一种双侧对抗学习网络，可以同时在用户和物品两侧过滤掉用户的敏感信息。此外，我们还提出了一种新的评估策略，利用项目内容属性和用户敏感属性之间的潜在关系来更好地评估算法减少歧视的能力。我们在三个真实数据集上的实验证明，我们提出的算法优于最先进的方法。

{"title":"Dual-side Adversarial Learning based Fair Recommendation for Sensitive Attribute Filtering","authors":"Shenghao Liu, Yu Zhang, Lingzhi Yi, Xianjun Deng, Laurence T. Yang, Bang Wang","doi":"10.1145/3648683","DOIUrl":"https://doi.org/10.1145/3648683","url":null,"abstract":"With the development of recommendation algorithms, researchers are paying increasing attention to fairness issues such as user discrimination in recommendations. To address these issues, existing works often filter users’ sensitive information that may cause discrimination during the process of learning user representations. However, these approaches overlook the latent relationship between items’ content attributes and users’ sensitive information. In this paper, we propose a fairness-aware recommendation algorithm (DALFRec) based on user-side and item-side adversarial learning to mitigate the effects of sensitive information in both sides of the recommendation process. Firstly, we conduct a statistical analysis to demonstrate the latent relationship between items’ information and users’ sensitive attributes. Then, we design a dual-side adversarial learning network that simultaneously filters out users’ sensitive information on the user and item side. Additionally, we propose a new evaluation strategy that leverages the latent relationship between items’ content attributes and users’ sensitive attributes to better assess the algorithm’s ability to reduce discrimination. Our experiments on three real datasets demonstrate the superiority of our proposed algorithm over state-of-the-art methods.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intricate Spatiotemporal Dependency Learning for Temporal Knowledge Graph Reasoning 用于时态知识图谱推理的复杂时空依赖性学习

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-16 DOI: 10.1145/3648366

Xuefei Li, Huiwei Zhou, Weihong Yao, Wenchu Li, Baojie Liu, Yingyu Lin

Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.

近几十年来，知识图谱（KG）推理一直是一个有趣的话题。目前，大多数研究都侧重于预测不完整知识图谱中缺失的事实。然而，用于预测未来事实的时态知识图谱（TKG）推理仍然面临着实体间随时间发生复杂交互的难题。本文基于图卷积网络（Graph Convolutional Network，GCN）提出了一种新颖复杂的时空依赖学习网络（Spatotemporal Dependency learning Network，STDN），以捕捉实体在不同时间戳下的潜在相关性。具体来说，我们首先学习一个自适应邻接矩阵，以描述一个实体在时间上相邻的事实的直接依赖关系，从而获得其先前的上下文嵌入。然后，提出一个时空特征编码 GCN（STE-GCN）来捕捉实体的潜在时空依赖关系，从而得到时空嵌入。最后，使用时间门单元整合之前的上下文嵌入和当前时间戳的时空嵌入，以更新实体演化嵌入，从而预测未来事实。STDN 可以生成更具表现力的嵌入，以捕捉 TKG 中错综复杂的时空依赖关系。在 WIKI、ICEWS14 和 ICEWS18 数据集上进行的大量实验证明，在时间推理任务中，我们的 STDN 比最先进的基线算法更具优势。

{"title":"Intricate Spatiotemporal Dependency Learning for Temporal Knowledge Graph Reasoning","authors":"Xuefei Li, Huiwei Zhou, Weihong Yao, Wenchu Li, Baojie Liu, Yingyu Lin","doi":"10.1145/3648366","DOIUrl":"https://doi.org/10.1145/3648366","url":null,"abstract":"Knowledge Graph (KG) reasoning has been an interesting topic in recent decades. Most current researches focus on predicting the missing facts for incomplete KG. Nevertheless, Temporal KG (TKG) reasoning, which is to forecast the future facts, still faces with the dilemma due to the complex interactions between entities over time. This paper proposes a novel intricate Spatiotemporal Dependency learning Network (STDN) based on Graph Convolutional Network (GCN) to capture the underlying correlations of an entity at different timestamps. Specifically, we first learn an adaptive adjacency matrix to depict the direct dependencies from the temporally adjacent facts of an entity, obtaining its previous context embedding. Then, a Spatiotemporal feature Encoding GCN (STE-GCN) is proposed to capture the latent spatiotemporal dependencies of the entity, getting the spatiotemporal embedding. Finally, a time gate unit is used to integrate the previous context embedding and the spatiotemporal embedding at the current timestamp to update the entity evolutional embedding for predicting future facts. STDN could generate the more expressive embeddings for capturing the intricate spatiotemporal dependencies in TKG. Extensive experiments on WIKI, ICEWS14 and ICEWS18 datasets prove our STDN has the advantage over state-of-the-art baselines for the temporal reasoning task.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Do we really need imputation in AutoML predictive modeling? 我们真的需要在 AutoML 预测模型中进行估算吗？

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-16 DOI: 10.1145/3643643

George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos

Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.

现实世界中的许多数据都包含缺失值，而与此相反，大多数机器学习（ML）算法都假定数据集是完整的。因此，人们提出了几种估算算法来预测和填补缺失值。鉴于在 AutoML 设置中调整的预测建模算法的进步，一个自然而然产生的问题是，在多大程度上真的需要复杂的归因算法（如基于神经网络的算法），或者我们可以使用简单的方法（如平均/模式 (MM)）获得下降的性能。在本文中，我们从 AutoML 预测建模的角度，通过实验比较了不同归因算法系列中的 6 种最先进的代表算法，包括特征选择步骤、组合算法和超参数选择。我们使用商业 AutoML 工具进行实验，其中包括所选的估算方法。实验在 25 个含有缺失值的二元分类真实世界不完整数据集和 10 个二元分类完整数据集上进行，在这些数据集中，根据不同的缺失机制，以不同的缺失频率引入了合成缺失值。我们从实验中得出的主要结论是，在真实世界数据集和模拟数据集中，平均而言最好的方法是去噪自动编码器（DAE）和 MissForest（MF），紧随其后的是 MM。此外，编码遗漏模式的二进制指示器（BI）变量实际上平均提高了预测性能。最后但并非最不重要的一点是，尽管在某些情况下，基于神经网络的估算能显著提高预测性能，但这需要付出巨大的计算成本，而且需要测量所有特征值来估算新样本。

{"title":"Do we really need imputation in AutoML predictive modeling?","authors":"George Paterakis, Stefanos Fafalios, Paulos Charonyktakis, Vassilis Christophides, Ioannis Tsamardinos","doi":"10.1145/3643643","DOIUrl":"https://doi.org/10.1145/3643643","url":null,"abstract":"Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by MM. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Breaking Truss-Based and Core-Based Communities 关于打破桁架式群落和核心式群落

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-14 DOI: 10.1145/3644077

Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering

We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph.

This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.

我们提出了一个一般性问题，即找出给定图中最小的边子集，删除该边子集后，该图就不存在群落。我们在两个备受关注的群体概念下考虑这个问题：k-桁架和 k-核心。我们还引入了一个问题变体，即确定的子集包含给定节点集的边，并确保这些节点不包含在任何社群中；在我们的案例中，是 k-truss 或 k-core。这些问题可直接应用于社交网络：已识别的边可被用户隐藏或从输出图中删除；或应用于通信网络：已识别的边对应于重要的网络连接。我们展示了一系列理论和实践成果。在理论方面，我们通过非难性还原证明，我们引入的问题是 NP 难问题，事实上很难近似。对于基于 k 桁架的问题，我们还展示了精确的指数时间算法，以及最优解大小的非难下限。在实际应用方面，我们开发了一系列启发式算法，这些算法通过我们提出的高效数据结构得以加速，用于在边删除的情况下更新桁架或核心分解。此外，我们还开发了一种计算下限的算法。在 11 个真实图和合成图上进行的大量实验表明，我们的启发式方法是有效的，其性能优于自然基线，而且由于我们的数据结构，其效率也很高（比自然基线快两个数量级）。此外，我们还介绍了一个关于共同作者网络的案例研究，实验表明，去除我们的启发式方法识别出的边不会对输入图的聚类结构产生实质性影响。这项工作扩展了 KDD 2021 论文，提供了新的理论结果，并介绍了基于核心的问题和算法。

{"title":"On Breaking Truss-Based and Core-Based Communities","authors":"Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Solon P. Pissis, Michelle Sweering","doi":"10.1145/3644077","DOIUrl":"https://doi.org/10.1145/3644077","url":null,"abstract":"We introduce the general problem of identifying a smallest edge subset of a given graph whose deletion makes the graph community-free. We consider this problem under two community notions which have attracted significant attention: k-truss and k-core. We also introduce a problem variant where the identified subset contains edges incident to a given set of nodes and ensures that these nodes are not contained in any community; k-truss or k-core, in our case. These problems are directly applicable in social networks: the identified edges can be hidden by users or sanitized from the output graph; or in communication networks: the identified edges correspond to vital network connections. We present a series of theoretical and practical results. On the theoretical side, we show through non-trivial reductions that the problems we introduce are NP-hard and, in fact, hard to approximate. For the k-truss based problems, we also show exact exponential-time algorithms, as well as a non-trivial lower bound on the size of an optimal solution. On the practical side, we develop a series of heuristics which are sped up by efficient data structures that we propose for updating the truss or core decomposition under edge deletions. In addition, we develop an algorithm to compute the lower bound. Extensive experiments on 11 real-world and synthetic graphs show that our heuristics are effective, outperforming natural baselines, and also efficient (up to two orders of magnitude faster than a natural baseline) thanks to our data structures. Furthermore, we present a case study on a co-authorship network and experiments showing that the removal of edges identified by our heuristics does not substantially affect the clustering structure of the input graph. This work extends a KDD 2021 paper, providing new theoretical results as well as introducing core-based problems and algorithms.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FairGAT: Fairness-aware Graph Attention Networks FairGAT：公平感知图注意网络

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-12 DOI: 10.1145/3645096

O. Deniz Kose, Yanning Shen

Graphs can facilitate modeling various complex systems such as gene networks and power grids, as well as analyzing the underlying relations within them. Learning over graphs has recently attracted increasing attention, particularly graph neural network-based (GNN) solutions, among which graph attention networks (GATs) have become one of the most widely utilized neural network structures for graph-based tasks. Although it is shown that the use of graph structures in learning results in the amplification of algorithmic bias, the influence of the attention design in GATs on algorithmic bias has not been investigated. Motivated by this, the present study first carries out a theoretical analysis in order to demonstrate the sources of algorithmic bias in GAT-based learning for node classification. Then, a novel algorithm, FairGAT, that leverages a fairness-aware attention design is developed based on the theoretical findings. Experimental results on real-world networks demonstrate that FairGAT improves group fairness measures while also providing comparable utility to the fairness-aware baselines for node classification and link prediction.

图可以帮助对基因网络和电网等各种复杂系统进行建模，并分析其中的内在关系。最近，对图的学习引起了越来越多的关注，特别是基于图神经网络（GNN）的解决方案，其中图注意力网络（GAT）已成为基于图的任务中使用最广泛的神经网络结构之一。虽然有研究表明，在学习中使用图结构会导致算法偏差的放大，但 GATs 中的注意力设计对算法偏差的影响尚未得到研究。受此启发，本研究首先进行了理论分析，以证明基于 GAT 的节点分类学习中算法偏差的来源。然后，在理论分析的基础上，开发了一种利用公平感知注意力设计的新型算法 FairGAT。在真实世界网络上的实验结果表明，FairGAT 提高了群体公平性度量，同时在节点分类和链接预测方面提供了与公平感知基线相当的效用。

引用次数: 0

Social Behavior Analysis in Exclusive Enterprise Social Networks by FastHAND 企业专属社交网络中的社交行为分析 by FastHAND

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-12 DOI: 10.1145/3646552

Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao

There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.

中国汽车行业正在出现一种新趋势，即汽车制造商正在引入专属的企业社交网络（EESN），以扩大销售和提供售后服务。传统的在线社交网络（OSN）和企业社交网络（ESN），如Twitter和Yammer，都是为促进平等个体之间的无序交流而精心设计的。然而，企业社交网络的用户是天然的社会分层，既有企业员工，也有客户。此外，运营 EESN 的动机可能相当复杂，包括提供客户服务和促进企业员工之间的交流。因此，EESN 中的社交行为可能与 OSN 和 ESN 中的社交行为大不相同。本文旨在分析 EESN 中的社交行为。我们将中国汽车制造商 NIO 作为 EESN 的一个典型例子，并做出以下贡献。首先，我们将 EESN 中的社交行为分析表述为异构社交网络中的链接预测问题。其次，为了分析这个链接预测问题，我们得出了丰富的用户特征，并为 EESNs 建立了多个元路径图。第三，我们开发了一种新颖的针对（D）有向图的快速（H）异构图（A）保持（N）网络算法（FastHAND），用于预测 EESNs 中用户之间的有向社交链接。该算法在节点级引入了特征组关注，并使用有向元路径图上的边采样算法来降低计算成本。通过在 NIO 社区数据上进行各种实验，我们证明了我们提出的 FastHAND 方法的预测能力。实验结果还验证了我们对 EESN 中社会亲和力传播的直觉。

{"title":"Social Behavior Analysis in Exclusive Enterprise Social Networks by FastHAND","authors":"Yang Yang, Feifei Wang, Enqiang Zhu, Fei Jiang, Wen Yao","doi":"10.1145/3646552","DOIUrl":"https://doi.org/10.1145/3646552","url":null,"abstract":"There is an emerging trend in the Chinese automobile industries that automakers are introducing exclusive enterprise social networks (EESNs) to expand sales and provide after-sale services. The traditional online social networks (OSNs) and enterprise social networks (ESNs), such as Twitter and Yammer, are ingeniously designed to facilitate unregulated communications among equal individuals. However, users in EESNs are naturally social stratified, consisting of both enterprise staffs and customers. In addition, the motivation to operate EESNs can be quite complicated, including providing customer services and facilitating communication among enterprise staffs. As a result, the social behaviors in EESNs can be quite different from those in OSNs and ESNs. In this work, we aim to analyze the social behaviors in EESNs. We consider the Chinese car manufacturer NIO as a typical example of EESNs and provide the following contributions. First, we formulate the social behavior analysis in EESNs as a link prediction problem in heterogeneous social networks. Second, to analyze this link prediction problem, we derive plentiful user features and build multiple meta-path graphs for EESNs. Third, we develop a novel Fast (H)eterogeneous graph (A)ttention (N)etwork algorithm for (D)irected graphs (FastHAND) to predict directed social links among users in EESNs. This algorithm introduces feature group attention at the node-level and uses an edge sampling algorithm over directed meta-path graphs to reduce the computation cost. By conducting various experiments on the NIO community data, we demonstrate the predictive power of our proposed FastHAND method. The experimental results also verify our intuitions about social affinity propagation in EESNs.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces nSimplex Zen：欧氏和希尔伯特空间的新型降维技术

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-10 DOI: 10.1145/3647642

Richard Connor, Lucia Vadicamo

Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space.

Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces.

Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances.

We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.

降维技术将数值从高维空间映射到低维空间。这样得到的空间所需的物理内存更少，距离计算速度更快。这些技术被广泛应用于需要降低维度空间的属性，使其与原始空间相比具有可接受的精确度。人们已经描述了许多此类变换。它们主要分为两类：线性和拓扑。线性方法，如主成分分析法（PCA）和随机投影法（RP），将基于矩阵的变换定义为欧几里得空间的较低维度。拓扑方法（如多维缩放（MDS））则试图保留更高层次的内容，如最近邻关系，其中一些方法还可应用于非欧几里得空间。在这里，我们介绍一种新颖的拓扑降维方法--nSimplex Zen。与 MDS 一样，它只依赖于在原始空间中测量的成对距离。由于使用的是距离而不是坐标，因此该技术既适用于欧几里得空间，也适用于其他希尔伯特空间，包括那些受余弦、詹森-香农和二次方形式距离控制的空间。我们的研究表明，在几乎所有情况下，由于高维空间的几何特性，我们的新技术都能提供比其他技术更好的特性，特别是在简化到非常低的维度时。

{"title":"nSimplex Zen: A Novel Dimensionality Reduction for Euclidean and Hilbert Spaces","authors":"Richard Connor, Lucia Vadicamo","doi":"10.1145/3647642","DOIUrl":"https://doi.org/10.1145/3647642","url":null,"abstract":"Dimensionality reduction techniques map values from a high dimensional space to one with a lower dimension. The result is a space which requires less physical memory and has a faster distance calculation. These techniques are widely used where required properties of the reduced-dimension space give an acceptable accuracy with respect to the original space. Many such transforms have been described. They have been classified in two main groups: linear and topological. Linear methods such as Principal Component Analysis (PCA) and Random Projection (RP) define matrix-based transforms into a lower dimension of Euclidean space. Topological methods such as Multidimensional Scaling (MDS) attempt to preserve higher-level aspects such as the nearest-neighbour relation, and some may be applied to non-Euclidean spaces. Here, we introduce nSimplex Zen, a novel topological method of reducing dimensionality. Like MDS, it relies only upon pairwise distances measured in the original space. The use of distances, rather than coordinates, allows the technique to be applied to both Euclidean and other Hilbert spaces, including those governed by Cosine, Jensen-Shannon and Quadratic Form distances. We show that in almost all cases, due to geometric properties of high-dimensional spaces, our new technique gives better properties than others, especially with reduction to very low dimensions.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Local community detection in multiple private networks 多重专用网络中的本地社区检测

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-10 DOI: 10.1145/3644078

Li Ni, Rui Ye, Wenjian Luo, Yiwen Zhang

Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.

个人往往参与多个在线社交网络。考虑到这些网络的所有者不愿意分享他们的网络，一些全局算法结合了多个网络的信息，在不分享网络边的情况下检测多个网络中的所有社区。当数据所有者只对包含给定节点的社区感兴趣时，就没有必要让多个网络相互影响以挖掘所有社区，而且计算成本很高。此外，专门寻找社区的数据所有者通常更愿意提供比全局算法要求更少的数据。因此，我们提出了本地协作社区检测问题（LCCD）。它利用来自多个网络的信息来联合检测包含给定节点的本地社区，而无需直接共享网络之间的边。为了解决 LCCD 问题，我们提出了一种从 M 方法发展而来的方法，称为 colM，用于检测多个网络中的本地社区。该方法采用安全的多方计算协议来保护每个网络的私有信息。我们在真实世界和合成数据集上进行了实验。实验结果表明，colM 方法能有效识别社区结构，其性能优于比较算法。

{"title":"Local community detection in multiple private networks","authors":"Li Ni, Rui Ye, Wenjian Luo, Yiwen Zhang","doi":"10.1145/3644078","DOIUrl":"https://doi.org/10.1145/3644078","url":null,"abstract":"Individuals are often involved in multiple online social networks. Considering that owners of these networks are unwilling to share their networks, some global algorithms combine information from multiple networks to detect all communities in multiple networks without sharing their edges. When data owners are only interested in the community containing a given node, it is unnecessary and computationally expensive for multiple networks to interact with each other to mine all communities. Moreover, data owners who are specifically looking for a community typically prefer to provide less data than the global algorithms require. Therefore, we propose the Local Collaborative Community Detection problem (LCCD). It exploits information from multiple networks to jointly detect the local community containing a given node, without directly sharing edges between networks. To address the LCCD problem, we present a method developed from M method, called colM, to detect the local community in multiple networks. This method adopts secure multiparty computation protocols to protect each network’s private information. Our experiments were conducted on real-world and synthetic datasets. Experimental results show that colM method could effectively identify community structures and outperform comparison algorithms.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"129 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139762910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning 基于生成的多视图对比，实现自我监督图表示学习

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-02-09 DOI: 10.1145/3645095

Yuehui Han

Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.

图对比学习在图结构数据的自监督表示学习方面取得了显著成就。通过使用扰动函数（即对图的节点或边进行扰动），大多数图对比学习方法都能在原始图上构建对比样本。然而，基于扰动的数据增强方法会随机改变图的固有信息（如属性或结构）。因此，在扰动图上进行节点嵌入后，我们无法保证对比样本的有效性以及图对比学习的学习性能。为此，我们在本文中为自监督图表示学习提出了一种新颖的基于生成器的多视图对比学习框架（GMVC），它基于我们的生成器而不是扰动函数生成对比样本。具体来说，在原始图上嵌入节点后，我们首先在邻域中采用随机行走的方法，为每个锚节点开发多个相关节点序列。然后，我们根据采样节点序列的特征和结构，利用变换器生成锚节点的相关对比样本表示。最后，通过最大限度地提高锚点视图与生成视图之间的一致性，我们迫使模型有效地将图信息编码到节点嵌入中。我们在八个基准数据集上进行了节点分类和链接预测任务的大量实验，验证了我们基于生成的多视图图对比学习方法的有效性。

{"title":"Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning","authors":"Yuehui Han","doi":"10.1145/3645095","DOIUrl":"https://doi.org/10.1145/3645095","url":null,"abstract":"Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"107 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0