Multiple recent studies show a paradox in graph convolutional networks (GCNs), that is, shallow architectures limit the capability of learning information from high-order neighbors, while deep architectures suffer from over-smoothing or over-squashing. To enjoy the simplicity of shallow architectures and overcome their limits of neighborhood extension, in this work, we introduce Biaffine technique to improve the expressiveness of graph convolutional networks with a shallow architecture. The core design of our method is to learn direct dependency on long-distance neighbors for nodes, with which only one-hop message passing is capable of capturing rich information for node representation. Besides, we propose a multi-view contrastive learning method to exploit the representations learned from long-distance dependencies. Extensive experiments on nine graph benchmark datasets suggest that the shallow biaffine graph convolutional networks (BAGCN) significantly outperforms state-of-the-art GCNs (with deep or shallow architectures) on semi-supervised node classification. We further verify the effectiveness of biaffine design in node representation learning and the performance consistency on different sizes of training data.
{"title":"Building Shortcuts between Distant Nodes with Biaffine Mapping for Graph Convolutional Networks","authors":"Acong Zhang, Jincheng Huang, Ping Li, Kai Zhang","doi":"10.1145/3650113","DOIUrl":"https://doi.org/10.1145/3650113","url":null,"abstract":"<p>Multiple recent studies show a paradox in graph convolutional networks (GCNs), that is, shallow architectures limit the capability of learning information from high-order neighbors, while deep architectures suffer from over-smoothing or over-squashing. To enjoy the simplicity of shallow architectures and overcome their limits of neighborhood extension, in this work, we introduce <i>Biaffine</i> technique to improve the expressiveness of graph convolutional networks with a shallow architecture. The core design of our method is to learn direct dependency on long-distance neighbors for nodes, with which only one-hop message passing is capable of capturing rich information for node representation. Besides, we propose a multi-view contrastive learning method to exploit the representations learned from long-distance dependencies. Extensive experiments on nine graph benchmark datasets suggest that the shallow biaffine graph convolutional networks (BAGCN) significantly outperforms state-of-the-art GCNs (with deep or shallow architectures) on semi-supervised node classification. We further verify the effectiveness of biaffine design in node representation learning and the performance consistency on different sizes of training data.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marked temporal point process models (MTPPs) aim to model event sequences and event markers (associated features) in continuous time. These models have been applied to various application domains where capturing event dynamics in continuous time is beneficial, such as education systems, social networks, and recommender systems. However, current MTPPs suffer from two major limitations, i.e., inefficient representation of event dynamic’s influence on marker distribution and losing fine-grained representation of historical marker distributions in the modeling. Motivated by these limitations, we propose a novel model called Marked Point Processes with Memory-Enhanced Neural Networks (MoMENt) that can capture the bidirectional interrelations between markers and event dynamics while providing fine-grained marker representations. Specifically, MoMENt is constructed of two concurrent networks: Recurrent Activity Updater (RAU) to capture model event dynamics and Memory-Enhanced Marker Updater (MEMU) to represent markers. Both RAU and MEMU components are designed to update each other at every step to model the bidirectional influence of markers and event dynamics. To obtain a fine-grained representation of maker distributions, MEMU is devised with external memories that model detailed marker-level features with latent component vectors. Our extensive experiments on six real-world user interaction datasets demonstrate that MoMENt can accurately represent users’ activity dynamics, boosting time, type, and marker predictions, as well as recommendation performance up to (76.5% ), (65.6% ), (77.2% ), and (57.7% ), respectively, compared to baseline approaches. Furthermore, our case studies show the effectiveness of MoMENt in providing meaningful and fine-grained interpretations of user-system relations over time, e.g., how user choices influence their future preferences in the recommendation domain.
{"title":"MoMENt: Marked Point Processes with Memory-Enhanced Neural Networks for User Activity Modeling","authors":"Sherry Sahebi, Mengfan Yao, Siqian Zhao, Reza Feyzi Behnagh","doi":"10.1145/3649504","DOIUrl":"https://doi.org/10.1145/3649504","url":null,"abstract":"<p>Marked temporal point process models (MTPPs) aim to model event sequences and event markers (associated features) in continuous time. These models have been applied to various application domains where capturing event dynamics in continuous time is beneficial, such as education systems, social networks, and recommender systems. However, current MTPPs suffer from two major limitations, i.e., inefficient representation of event dynamic’s influence on marker distribution and losing fine-grained representation of historical marker distributions in the modeling. Motivated by these limitations, we propose a novel model called <underline>M</underline>arked P<underline>o</underline>int Processes with <underline>M</underline>emory-<underline>E</underline>nhanced <underline>N</underline>eural Ne<underline>t</underline>works (MoMENt) that can capture the bidirectional interrelations between markers and event dynamics while providing fine-grained marker representations. Specifically, MoMENt is constructed of two concurrent networks: Recurrent Activity Updater (RAU) to capture model event dynamics and Memory-Enhanced Marker Updater (MEMU) to represent markers. Both RAU and MEMU components are designed to update each other at every step to model the bidirectional influence of markers and event dynamics. To obtain a fine-grained representation of maker distributions, MEMU is devised with external memories that model detailed marker-level features with latent component vectors. Our extensive experiments on six real-world user interaction datasets demonstrate that MoMENt can accurately represent users’ activity dynamics, boosting time, type, and marker predictions, as well as recommendation performance up to (76.5% ), (65.6% ), (77.2% ), and (57.7% ), respectively, compared to baseline approaches. Furthermore, our case studies show the effectiveness of MoMENt in providing meaningful and fine-grained interpretations of user-system relations over time, e.g., how user choices influence their future preferences in the recommendation domain.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140007970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Node classification is to predict the class label of a node by analyzing its properties and interactions in a network. We note that many existing solutions for graph-based node classification only consider node connectivity but not node’s local topology structure. However, nodes residing in different parts of a real-world network may share similar local topology structures. For example, local topology structures in a payment network may reveal sellers’ business roles (e.g., supplier or retailer). To model both connectivity and local topology structure for better node classification performance, we present DP-GCN, a dual-path graph convolution network. DP-GCN consists of three main modules: (i) a C-GCN module to capture the connectivity relationships between nodes, (ii) a T-GCN module to capture the topology structure similarity among nodes, and (iii) a multi-head self-attention module to align both properties. We evaluate DP-GCN on seven benchmark datasets against diverse baselines to demonstrate its effectiveness. We also provide a case study of running DP-GCN on three large-scale payment networks from PayPal, a leading payment service provider, for risky seller detection. Experimental results show DP-GCN’s effectiveness and practicability in large-scale settings. PayPal’s internal testing also show DP-GCN’s effectiveness in defending real risks from transaction networks.
{"title":"DP-GCN: Node Classification by Connectivity and Local Topology Structure on Real-World Network","authors":"Zhe Chen, Aixin Sun","doi":"10.1145/3649460","DOIUrl":"https://doi.org/10.1145/3649460","url":null,"abstract":"<p>Node classification is to predict the class label of a node by analyzing its properties and interactions in a network. We note that many existing solutions for graph-based node classification only consider node connectivity but not node’s local topology structure. However, nodes residing in different parts of a real-world network may share similar local topology structures. For example, local topology structures in a payment network may reveal sellers’ business roles (<i>e.g.,</i> supplier or retailer). To model both connectivity and local topology structure for better node classification performance, we present DP-GCN, a dual-path graph convolution network. DP-GCN consists of three main modules: (i) a C-GCN module to capture the connectivity relationships between nodes, (ii) a T-GCN module to capture the topology structure similarity among nodes, and (iii) a multi-head self-attention module to align both properties. We evaluate DP-GCN on seven benchmark datasets against diverse baselines to demonstrate its effectiveness. We also provide a case study of running DP-GCN on three large-scale payment networks from PayPal, a leading payment service provider, for risky seller detection. Experimental results show DP-GCN’s effectiveness and practicability in large-scale settings. PayPal’s internal testing also show DP-GCN’s effectiveness in defending real risks from transaction networks.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139988036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current language models. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at https://github.com/Mooler0410/LLMsPracticalGuide. An LLMs evolutionary tree, editable yet regularly updated, can be found at llmtree.ai.
{"title":"Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond","authors":"Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Shaochen Zhong, Bing Yin, Xia Hu","doi":"10.1145/3649506","DOIUrl":"https://doi.org/10.1145/3649506","url":null,"abstract":"<p>This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current language models. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at https://github.com/Mooler0410/LLMsPracticalGuide. An LLMs evolutionary tree, editable yet regularly updated, can be found at llmtree.ai.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph neural networks (GNNs) have shown great potential in representation learning for various graph tasks. However, the distribution shift between the training and test sets poses a challenge to the efficiency of GNNs. To address this challenge, HomoTTT propose a fully test-time training (FTTT) framework for GNNs to enhance the model’s generalization capabilities for node classification tasks. Specifically, our proposed HomoTTT designs a homophily-based and parameter-free graph contrastive learning task with adaptive augmentation to guide the model’s adaptation during the test time training, allowing the model to adapt for specific target data. In the inference stage, HomoTTT proposes to integrate the original GNN model and the adapted model after TTT using a homophily-based model selection method, which prevents potential performance degradation caused by unconstrained model adaptation. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of our proposed framework. Additionally, the exploratory study further validates the rationality of the homophily-based graph contrastive learning task with adaptive augmentation and the homophily-based model selection designed in HomoTTT.
{"title":"A Fully Test-Time Training Framework for Semi-Supervised Node Classification on Out-of-Distribution Graphs","authors":"Jiaxin Zhang, Yiqi Wang, Xihong Yang, En Zhu","doi":"10.1145/3649507","DOIUrl":"https://doi.org/10.1145/3649507","url":null,"abstract":"<p>Graph neural networks (GNNs) have shown great potential in representation learning for various graph tasks. However, the distribution shift between the training and test sets poses a challenge to the efficiency of GNNs. To address this challenge, <span>HomoTTT</span> propose a fully test-time training (FTTT) framework for GNNs to enhance the model’s generalization capabilities for node classification tasks. Specifically, our proposed <span>HomoTTT</span> designs a homophily-based and parameter-free graph contrastive learning task with adaptive augmentation to guide the model’s adaptation during the test time training, allowing the model to adapt for specific target data. In the inference stage, <span>HomoTTT</span> proposes to integrate the original GNN model and the adapted model after TTT using a homophily-based model selection method, which prevents potential performance degradation caused by unconstrained model adaptation. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of our proposed framework. Additionally, the exploratory study further validates the rationality of the homophily-based graph contrastive learning task with adaptive augmentation and the homophily-based model selection designed in <span>HomoTTT</span>.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"282 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed
Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. We categorize these techniques by whether they focus on improving fairness in the pre-processing, in-processing (during training), or post-processing phases. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.
{"title":"Fairness-Aware Graph Neural Networks: A Survey","authors":"April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed","doi":"10.1145/3649142","DOIUrl":"https://doi.org/10.1145/3649142","url":null,"abstract":"<p>Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. We categorize these techniques by whether they focus on improving fairness in the pre-processing, in-processing (during training), or post-processing phases. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
In federated learning (FL), malicious clients could manipulate the predictions of the trained model through backdoor attacks, posing a significant threat to the security of FL systems. Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario, where all clients collaborate to train a single global model. A recent study conducted by Qin et al. [24] marks the initial exploration of backdoor attacks within the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with parameter decoupling can significantly enhance robustness against backdoor attacks. However, in this paper, we whistleblow that pFL methods with parameter decoupling are still vulnerable to backdoor attacks. The resistance of pFL methods with parameter decoupling is attributed to the heterogeneous classifiers between malicious clients and benign counterparts. We analyze two direct causes of the heterogeneous classifiers: (1) data heterogeneity inherently exists among clients and (2) poisoning by malicious clients further exacerbates the data heterogeneity. To address these issues, we propose a two-pronged attack method, BapFL , which comprises two simple yet effective strategies: (1) poisoning only the feature encoder while keeping the classifier fixed and (2) diversifying the classifier through noise introduction to simulate that of the benign clients. Extensive experiments on three benchmark datasets under varying conditions demonstrate the effectiveness of our proposed attack. Additionally, we evaluate the effectiveness of six widely used defense methods and find that BapFL still poses a significant threat even in the presence of the best defense, Multi-Krum. We hope to inspire further research on attack and defense strategies in pFL scenarios. The code is available at: https://github.com/BapFL/code.
{"title":"BapFL : You can Backdoor Personalized Federated Learning","authors":"Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao","doi":"10.1145/3649316","DOIUrl":"https://doi.org/10.1145/3649316","url":null,"abstract":"<p>In federated learning (FL), malicious clients could manipulate the predictions of the trained model through backdoor attacks, posing a significant threat to the security of FL systems. Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario, where all clients collaborate to train a single global model. A recent study conducted by Qin et al. [24] marks the initial exploration of backdoor attacks within the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with <i>parameter decoupling</i> can significantly enhance robustness against backdoor attacks. However, in this paper, we whistleblow that pFL methods with parameter decoupling are still vulnerable to backdoor attacks. The resistance of pFL methods with parameter decoupling is attributed to the heterogeneous classifiers between malicious clients and benign counterparts. We analyze two direct causes of the heterogeneous classifiers: (1) data heterogeneity inherently exists among clients and (2) poisoning by malicious clients further exacerbates the data heterogeneity. To address these issues, we propose a two-pronged attack method, BapFL , which comprises two simple yet effective strategies: (1) poisoning only the feature encoder while keeping the classifier fixed and (2) diversifying the classifier through noise introduction to simulate that of the benign clients. Extensive experiments on three benchmark datasets under varying conditions demonstrate the effectiveness of our proposed attack. Additionally, we evaluate the effectiveness of six widely used defense methods and find that BapFL still poses a significant threat even in the presence of the best defense, Multi-Krum. We hope to inspire further research on attack and defense strategies in pFL scenarios. The code is available at: https://github.com/BapFL/code.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"126 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taoran Ji, Nathan Self, Kaiqun Fu, Zhiqian Chen, Naren Ramakrishnan, Chang-Tien Lu
Forecasting citations of scientific patents and publications is a crucial task for understanding the evolution and development of technological domains and for foresight into emerging technologies. By construing citations as a time series, the task can be cast into the domain of temporal point processes. Most existing work on forecasting with temporal point processes, both conventional and neural network-based, only performs single-step forecasting. In citation forecasting, however, the more salient goal is n-step forecasting: predicting the arrival of the next n citations. In this paper, we propose Dynamic Multi-Context Attention Networks (DMA-Nets), a novel deep learning sequence-to-sequence (Seq2Seq) model with a novel hierarchical dynamic attention mechanism for long-term citation forecasting. Extensive experiments on two real-world datasets demonstrate that the proposed model learns better representations of conditional dependencies over historical sequences compared to state-of-the-art counterparts and thus achieves significant performance for citation predictions.
预测科学专利和出版物的引用情况是了解技术领域的演变和发展以及展望新兴技术的一项重要任务。通过将引文解释为时间序列,可以将这项任务纳入时间点过程领域。无论是传统预测还是基于神经网络的预测,大多数现有的时间点过程预测工作都只能进行单步预测。然而,在引文预测中,更突出的目标是 n 步预测:预测下一个 n 篇引文的到来。在本文中,我们提出了动态多语境注意力网络(DMA-Nets),这是一种新颖的深度学习序列到序列(Seq2Seq)模型,具有新颖的分层动态注意力机制,可用于长期引文预测。在两个真实世界数据集上进行的广泛实验表明,与最先进的同行相比,所提出的模型能更好地学习历史序列的条件依赖关系表征,因此在引文预测方面取得了显著的性能。
{"title":"Citation Forecasting with Multi-Context Attention-Aided Dependency Modeling","authors":"Taoran Ji, Nathan Self, Kaiqun Fu, Zhiqian Chen, Naren Ramakrishnan, Chang-Tien Lu","doi":"10.1145/3649140","DOIUrl":"https://doi.org/10.1145/3649140","url":null,"abstract":"<p>Forecasting citations of scientific patents and publications is a crucial task for understanding the evolution and development of technological domains and for foresight into emerging technologies. By construing citations as a time series, the task can be cast into the domain of temporal point processes. Most existing work on forecasting with temporal point processes, both conventional and neural network-based, only performs single-step forecasting. In citation forecasting, however, the more salient goal is <i>n</i>-step forecasting: predicting the arrival of the next <i>n</i> citations. In this paper, we propose Dynamic Multi-Context Attention Networks (DMA-Nets), a novel deep learning sequence-to-sequence (Seq2Seq) model with a novel hierarchical dynamic attention mechanism for long-term citation forecasting. Extensive experiments on two real-world datasets demonstrate that the proposed model learns better representations of conditional dependencies over historical sequences compared to state-of-the-art counterparts and thus achieves significant performance for citation predictions.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"14 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph self-supervised representation learning has gained considerable attention and demonstrated remarkable efficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data. Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However, the former methods primarily focus on global structures, potentially overlooking some fine-grained information during reconstruction. The latter methods emphasize node similarity across correlated views in the embedding space, potentially neglecting the inherent global graph information in the original input space. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailable for certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integrate masked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learn node representations in graphs. In our method, we begin by masking a portion of node features and utilize a specific decoding strategy to reconstruct the masked information. This process facilitates the recovery of graphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat the masked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformity between their corresponding node representations at a local or micro level. Lastly, to capture cluster structures from a meso level and learn more discriminative representations, we introduce a prototype-aware clustering consistency loss that is jointly optimized with the above two complementary objectives. Extensive experiments conducted on several datasets demonstrate that the proposed method achieves significantly better or competitive performance on downstream tasks, especially for graph clustering, compared with the state-of-the-art methods, showcasing its superiority in enhancing graph representation learning.
{"title":"ProtoMGAE: Prototype-aware Masked Graph Auto-Encoder for Graph Representation Learning","authors":"Yimei Zheng, Caiyan Jia","doi":"10.1145/3649143","DOIUrl":"https://doi.org/10.1145/3649143","url":null,"abstract":"<p>Graph self-supervised representation learning has gained considerable attention and demonstrated remarkable efficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data. Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However, the former methods primarily focus on global structures, potentially overlooking some fine-grained information during reconstruction. The latter methods emphasize node similarity across correlated views in the embedding space, potentially neglecting the inherent global graph information in the original input space. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailable for certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integrate masked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learn node representations in graphs. In our method, we begin by masking a portion of node features and utilize a specific decoding strategy to reconstruct the masked information. This process facilitates the recovery of graphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat the masked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformity between their corresponding node representations at a local or micro level. Lastly, to capture cluster structures from a meso level and learn more discriminative representations, we introduce a prototype-aware clustering consistency loss that is jointly optimized with the above two complementary objectives. Extensive experiments conducted on several datasets demonstrate that the proposed method achieves significantly better or competitive performance on downstream tasks, especially for graph clustering, compared with the state-of-the-art methods, showcasing its superiority in enhancing graph representation learning.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"48 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen
The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the i.i.d (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.
{"title":"Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness","authors":"Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen","doi":"10.1145/3648684","DOIUrl":"https://doi.org/10.1145/3648684","url":null,"abstract":"<p>The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner’s objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant limitation of current approaches lies in their heavy reliance on the <i>i.i.d</i> (independent and identically distributed) assumption concerning data, leading to a static regret analysis of the framework. Nevertheless, it’s crucial to note that achieving low static regret does not necessarily translate to strong performance in dynamic environments characterized by tasks sampled from diverse distributions. In this paper, to tackle the fairness-aware online learning challenge in evolving settings, we introduce a unique regret measure, FairSAR, by incorporating long-term fairness constraints into a strongly adapted loss regret framework. Moreover, to determine an optimal model parameter at each time step, we introduce an innovative adaptive fairness-aware online meta-learning algorithm, referred to as FairSAOML. This algorithm possesses the ability to adjust to dynamic environments by effectively managing bias control and model accuracy. The problem is framed as a bi-level convex-concave optimization, considering both the model’s primal and dual parameters, which pertain to its accuracy and fairness attributes, respectively. Theoretical analysis yields sub-linear upper bounds for both loss regret and the cumulative violation of fairness constraints. Our experimental evaluation on various real-world datasets in dynamic environments demonstrates that our proposed FairSAOML algorithm consistently outperforms alternative approaches rooted in the most advanced prior online learning methods.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}