Multiple recent studies show a paradox in graph convolutional networks (GCNs), that is, shallow architectures limit the capability of learning information from high-order neighbors, while deep architectures suffer from over-smoothing or over-squashing. To enjoy the simplicity of shallow architectures and overcome their limits of neighborhood extension, in this work, we introduce Biaffine technique to improve the expressiveness of graph convolutional networks with a shallow architecture. The core design of our method is to learn direct dependency on long-distance neighbors for nodes, with which only one-hop message passing is capable of capturing rich information for node representation. Besides, we propose a multi-view contrastive learning method to exploit the representations learned from long-distance dependencies. Extensive experiments on nine graph benchmark datasets suggest that the shallow biaffine graph convolutional networks (BAGCN) significantly outperforms state-of-the-art GCNs (with deep or shallow architectures) on semi-supervised node classification. We further verify the effectiveness of biaffine design in node representation learning and the performance consistency on different sizes of training data.
{"title":"Building Shortcuts between Distant Nodes with Biaffine Mapping for Graph Convolutional Networks","authors":"Acong Zhang, Jincheng Huang, Ping Li, Kai Zhang","doi":"10.1145/3650113","DOIUrl":"https://doi.org/10.1145/3650113","url":null,"abstract":"<p>Multiple recent studies show a paradox in graph convolutional networks (GCNs), that is, shallow architectures limit the capability of learning information from high-order neighbors, while deep architectures suffer from over-smoothing or over-squashing. To enjoy the simplicity of shallow architectures and overcome their limits of neighborhood extension, in this work, we introduce <i>Biaffine</i> technique to improve the expressiveness of graph convolutional networks with a shallow architecture. The core design of our method is to learn direct dependency on long-distance neighbors for nodes, with which only one-hop message passing is capable of capturing rich information for node representation. Besides, we propose a multi-view contrastive learning method to exploit the representations learned from long-distance dependencies. Extensive experiments on nine graph benchmark datasets suggest that the shallow biaffine graph convolutional networks (BAGCN) significantly outperforms state-of-the-art GCNs (with deep or shallow architectures) on semi-supervised node classification. We further verify the effectiveness of biaffine design in node representation learning and the performance consistency on different sizes of training data.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marked temporal point process models (MTPPs) aim to model event sequences and event markers (associated features) in continuous time. These models have been applied to various application domains where capturing event dynamics in continuous time is beneficial, such as education systems, social networks, and recommender systems. However, current MTPPs suffer from two major limitations, i.e., inefficient representation of event dynamic’s influence on marker distribution and losing fine-grained representation of historical marker distributions in the modeling. Motivated by these limitations, we propose a novel model called Marked Point Processes with Memory-Enhanced Neural Networks (MoMENt) that can capture the bidirectional interrelations between markers and event dynamics while providing fine-grained marker representations. Specifically, MoMENt is constructed of two concurrent networks: Recurrent Activity Updater (RAU) to capture model event dynamics and Memory-Enhanced Marker Updater (MEMU) to represent markers. Both RAU and MEMU components are designed to update each other at every step to model the bidirectional influence of markers and event dynamics. To obtain a fine-grained representation of maker distributions, MEMU is devised with external memories that model detailed marker-level features with latent component vectors. Our extensive experiments on six real-world user interaction datasets demonstrate that MoMENt can accurately represent users’ activity dynamics, boosting time, type, and marker predictions, as well as recommendation performance up to (76.5% ), (65.6% ), (77.2% ), and (57.7% ), respectively, compared to baseline approaches. Furthermore, our case studies show the effectiveness of MoMENt in providing meaningful and fine-grained interpretations of user-system relations over time, e.g., how user choices influence their future preferences in the recommendation domain.
{"title":"MoMENt: Marked Point Processes with Memory-Enhanced Neural Networks for User Activity Modeling","authors":"Sherry Sahebi, Mengfan Yao, Siqian Zhao, Reza Feyzi Behnagh","doi":"10.1145/3649504","DOIUrl":"https://doi.org/10.1145/3649504","url":null,"abstract":"<p>Marked temporal point process models (MTPPs) aim to model event sequences and event markers (associated features) in continuous time. These models have been applied to various application domains where capturing event dynamics in continuous time is beneficial, such as education systems, social networks, and recommender systems. However, current MTPPs suffer from two major limitations, i.e., inefficient representation of event dynamic’s influence on marker distribution and losing fine-grained representation of historical marker distributions in the modeling. Motivated by these limitations, we propose a novel model called <underline>M</underline>arked P<underline>o</underline>int Processes with <underline>M</underline>emory-<underline>E</underline>nhanced <underline>N</underline>eural Ne<underline>t</underline>works (MoMENt) that can capture the bidirectional interrelations between markers and event dynamics while providing fine-grained marker representations. Specifically, MoMENt is constructed of two concurrent networks: Recurrent Activity Updater (RAU) to capture model event dynamics and Memory-Enhanced Marker Updater (MEMU) to represent markers. Both RAU and MEMU components are designed to update each other at every step to model the bidirectional influence of markers and event dynamics. To obtain a fine-grained representation of maker distributions, MEMU is devised with external memories that model detailed marker-level features with latent component vectors. Our extensive experiments on six real-world user interaction datasets demonstrate that MoMENt can accurately represent users’ activity dynamics, boosting time, type, and marker predictions, as well as recommendation performance up to (76.5% ), (65.6% ), (77.2% ), and (57.7% ), respectively, compared to baseline approaches. Furthermore, our case studies show the effectiveness of MoMENt in providing meaningful and fine-grained interpretations of user-system relations over time, e.g., how user choices influence their future preferences in the recommendation domain.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140007970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Node classification is to predict the class label of a node by analyzing its properties and interactions in a network. We note that many existing solutions for graph-based node classification only consider node connectivity but not node’s local topology structure. However, nodes residing in different parts of a real-world network may share similar local topology structures. For example, local topology structures in a payment network may reveal sellers’ business roles (e.g., supplier or retailer). To model both connectivity and local topology structure for better node classification performance, we present DP-GCN, a dual-path graph convolution network. DP-GCN consists of three main modules: (i) a C-GCN module to capture the connectivity relationships between nodes, (ii) a T-GCN module to capture the topology structure similarity among nodes, and (iii) a multi-head self-attention module to align both properties. We evaluate DP-GCN on seven benchmark datasets against diverse baselines to demonstrate its effectiveness. We also provide a case study of running DP-GCN on three large-scale payment networks from PayPal, a leading payment service provider, for risky seller detection. Experimental results show DP-GCN’s effectiveness and practicability in large-scale settings. PayPal’s internal testing also show DP-GCN’s effectiveness in defending real risks from transaction networks.
{"title":"DP-GCN: Node Classification by Connectivity and Local Topology Structure on Real-World Network","authors":"Zhe Chen, Aixin Sun","doi":"10.1145/3649460","DOIUrl":"https://doi.org/10.1145/3649460","url":null,"abstract":"<p>Node classification is to predict the class label of a node by analyzing its properties and interactions in a network. We note that many existing solutions for graph-based node classification only consider node connectivity but not node’s local topology structure. However, nodes residing in different parts of a real-world network may share similar local topology structures. For example, local topology structures in a payment network may reveal sellers’ business roles (<i>e.g.,</i> supplier or retailer). To model both connectivity and local topology structure for better node classification performance, we present DP-GCN, a dual-path graph convolution network. DP-GCN consists of three main modules: (i) a C-GCN module to capture the connectivity relationships between nodes, (ii) a T-GCN module to capture the topology structure similarity among nodes, and (iii) a multi-head self-attention module to align both properties. We evaluate DP-GCN on seven benchmark datasets against diverse baselines to demonstrate its effectiveness. We also provide a case study of running DP-GCN on three large-scale payment networks from PayPal, a leading payment service provider, for risky seller detection. Experimental results show DP-GCN’s effectiveness and practicability in large-scale settings. PayPal’s internal testing also show DP-GCN’s effectiveness in defending real risks from transaction networks.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139988036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current language models. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at https://github.com/Mooler0410/LLMsPracticalGuide. An LLMs evolutionary tree, editable yet regularly updated, can be found at llmtree.ai.
{"title":"Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond","authors":"Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Shaochen Zhong, Bing Yin, Xia Hu","doi":"10.1145/3649506","DOIUrl":"https://doi.org/10.1145/3649506","url":null,"abstract":"<p>This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current language models. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at https://github.com/Mooler0410/LLMsPracticalGuide. An LLMs evolutionary tree, editable yet regularly updated, can be found at llmtree.ai.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph neural networks (GNNs) have shown great potential in representation learning for various graph tasks. However, the distribution shift between the training and test sets poses a challenge to the efficiency of GNNs. To address this challenge, HomoTTT propose a fully test-time training (FTTT) framework for GNNs to enhance the model’s generalization capabilities for node classification tasks. Specifically, our proposed HomoTTT designs a homophily-based and parameter-free graph contrastive learning task with adaptive augmentation to guide the model’s adaptation during the test time training, allowing the model to adapt for specific target data. In the inference stage, HomoTTT proposes to integrate the original GNN model and the adapted model after TTT using a homophily-based model selection method, which prevents potential performance degradation caused by unconstrained model adaptation. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of our proposed framework. Additionally, the exploratory study further validates the rationality of the homophily-based graph contrastive learning task with adaptive augmentation and the homophily-based model selection designed in HomoTTT.
{"title":"A Fully Test-Time Training Framework for Semi-Supervised Node Classification on Out-of-Distribution Graphs","authors":"Jiaxin Zhang, Yiqi Wang, Xihong Yang, En Zhu","doi":"10.1145/3649507","DOIUrl":"https://doi.org/10.1145/3649507","url":null,"abstract":"<p>Graph neural networks (GNNs) have shown great potential in representation learning for various graph tasks. However, the distribution shift between the training and test sets poses a challenge to the efficiency of GNNs. To address this challenge, <span>HomoTTT</span> propose a fully test-time training (FTTT) framework for GNNs to enhance the model’s generalization capabilities for node classification tasks. Specifically, our proposed <span>HomoTTT</span> designs a homophily-based and parameter-free graph contrastive learning task with adaptive augmentation to guide the model’s adaptation during the test time training, allowing the model to adapt for specific target data. In the inference stage, <span>HomoTTT</span> proposes to integrate the original GNN model and the adapted model after TTT using a homophily-based model selection method, which prevents potential performance degradation caused by unconstrained model adaptation. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of our proposed framework. Additionally, the exploratory study further validates the rationality of the homophily-based graph contrastive learning task with adaptive augmentation and the homophily-based model selection designed in <span>HomoTTT</span>.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"282 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed
Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. We categorize these techniques by whether they focus on improving fairness in the pre-processing, in-processing (during training), or post-processing phases. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.
{"title":"Fairness-Aware Graph Neural Networks: A Survey","authors":"April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed","doi":"10.1145/3649142","DOIUrl":"https://doi.org/10.1145/3649142","url":null,"abstract":"<p>Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. We categorize these techniques by whether they focus on improving fairness in the pre-processing, in-processing (during training), or post-processing phases. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
In federated learning (FL), malicious clients could manipulate the predictions of the trained model through backdoor attacks, posing a significant threat to the security of FL systems. Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario, where all clients collaborate to train a single global model. A recent study conducted by Qin et al. [24] marks the initial exploration of backdoor attacks within the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with parameter decoupling can significantly enhance robustness against backdoor attacks. However, in this paper, we whistleblow that pFL methods with parameter decoupling are still vulnerable to backdoor attacks. The resistance of pFL methods with parameter decoupling is attributed to the heterogeneous classifiers between malicious clients and benign counterparts. We analyze two direct causes of the heterogeneous classifiers: (1) data heterogeneity inherently exists among clients and (2) poisoning by malicious clients further exacerbates the data heterogeneity. To address these issues, we propose a two-pronged attack method, BapFL , which comprises two simple yet effective strategies: (1) poisoning only the feature encoder while keeping the classifier fixed and (2) diversifying the classifier through noise introduction to simulate that of the benign clients. Extensive experiments on three benchmark datasets under varying conditions demonstrate the effectiveness of our proposed attack. Additionally, we evaluate the effectiveness of six widely used defense methods and find that BapFL still poses a significant threat even in the presence of the best defense, Multi-Krum. We hope to inspire further research on attack and defense strategies in pFL scenarios. The code is available at: https://github.com/BapFL/code.
{"title":"BapFL : You can Backdoor Personalized Federated Learning","authors":"Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao","doi":"10.1145/3649316","DOIUrl":"https://doi.org/10.1145/3649316","url":null,"abstract":"<p>In federated learning (FL), malicious clients could manipulate the predictions of the trained model through backdoor attacks, posing a significant threat to the security of FL systems. Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario, where all clients collaborate to train a single global model. A recent study conducted by Qin et al. [24] marks the initial exploration of backdoor attacks within the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with <i>parameter decoupling</i> can significantly enhance robustness against backdoor attacks. However, in this paper, we whistleblow that pFL methods with parameter decoupling are still vulnerable to backdoor attacks. The resistance of pFL methods with parameter decoupling is attributed to the heterogeneous classifiers between malicious clients and benign counterparts. We analyze two direct causes of the heterogeneous classifiers: (1) data heterogeneity inherently exists among clients and (2) poisoning by malicious clients further exacerbates the data heterogeneity. To address these issues, we propose a two-pronged attack method, BapFL , which comprises two simple yet effective strategies: (1) poisoning only the feature encoder while keeping the classifier fixed and (2) diversifying the classifier through noise introduction to simulate that of the benign clients. Extensive experiments on three benchmark datasets under varying conditions demonstrate the effectiveness of our proposed attack. Additionally, we evaluate the effectiveness of six widely used defense methods and find that BapFL still poses a significant threat even in the presence of the best defense, Multi-Krum. We hope to inspire further research on attack and defense strategies in pFL scenarios. The code is available at: https://github.com/BapFL/code.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"126 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taoran Ji, Nathan Self, Kaiqun Fu, Zhiqian Chen, Naren Ramakrishnan, Chang-Tien Lu
Forecasting citations of scientific patents and publications is a crucial task for understanding the evolution and development of technological domains and for foresight into emerging technologies. By construing citations as a time series, the task can be cast into the domain of temporal point processes. Most existing work on forecasting with temporal point processes, both conventional and neural network-based, only performs single-step forecasting. In citation forecasting, however, the more salient goal is n-step forecasting: predicting the arrival of the next n citations. In this paper, we propose Dynamic Multi-Context Attention Networks (DMA-Nets), a novel deep learning sequence-to-sequence (Seq2Seq) model with a novel hierarchical dynamic attention mechanism for long-term citation forecasting. Extensive experiments on two real-world datasets demonstrate that the proposed model learns better representations of conditional dependencies over historical sequences compared to state-of-the-art counterparts and thus achieves significant performance for citation predictions.
预测科学专利和出版物的引用情况是了解技术领域的演变和发展以及展望新兴技术的一项重要任务。通过将引文解释为时间序列,可以将这项任务纳入时间点过程领域。无论是传统预测还是基于神经网络的预测,大多数现有的时间点过程预测工作都只能进行单步预测。然而,在引文预测中,更突出的目标是 n 步预测:预测下一个 n 篇引文的到来。在本文中,我们提出了动态多语境注意力网络(DMA-Nets),这是一种新颖的深度学习序列到序列(Seq2Seq)模型,具有新颖的分层动态注意力机制,可用于长期引文预测。在两个真实世界数据集上进行的广泛实验表明,与最先进的同行相比,所提出的模型能更好地学习历史序列的条件依赖关系表征,因此在引文预测方面取得了显著的性能。
{"title":"Citation Forecasting with Multi-Context Attention-Aided Dependency Modeling","authors":"Taoran Ji, Nathan Self, Kaiqun Fu, Zhiqian Chen, Naren Ramakrishnan, Chang-Tien Lu","doi":"10.1145/3649140","DOIUrl":"https://doi.org/10.1145/3649140","url":null,"abstract":"<p>Forecasting citations of scientific patents and publications is a crucial task for understanding the evolution and development of technological domains and for foresight into emerging technologies. By construing citations as a time series, the task can be cast into the domain of temporal point processes. Most existing work on forecasting with temporal point processes, both conventional and neural network-based, only performs single-step forecasting. In citation forecasting, however, the more salient goal is <i>n</i>-step forecasting: predicting the arrival of the next <i>n</i> citations. In this paper, we propose Dynamic Multi-Context Attention Networks (DMA-Nets), a novel deep learning sequence-to-sequence (Seq2Seq) model with a novel hierarchical dynamic attention mechanism for long-term citation forecasting. Extensive experiments on two real-world datasets demonstrate that the proposed model learns better representations of conditional dependencies over historical sequences compared to state-of-the-art counterparts and thus achieves significant performance for citation predictions.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"14 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph self-supervised representation learning has gained considerable attention and demonstrated remarkable efficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data. Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However, the former methods primarily focus on global structures, potentially overlooking some fine-grained information during reconstruction. The latter methods emphasize node similarity across correlated views in the embedding space, potentially neglecting the inherent global graph information in the original input space. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailable for certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integrate masked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learn node representations in graphs. In our method, we begin by masking a portion of node features and utilize a specific decoding strategy to reconstruct the masked information. This process facilitates the recovery of graphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat the masked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformity between their corresponding node representations at a local or micro level. Lastly, to capture cluster structures from a meso level and learn more discriminative representations, we introduce a prototype-aware clustering consistency loss that is jointly optimized with the above two complementary objectives. Extensive experiments conducted on several datasets demonstrate that the proposed method achieves significantly better or competitive performance on downstream tasks, especially for graph clustering, compared with the state-of-the-art methods, showcasing its superiority in enhancing graph representation learning.
{"title":"ProtoMGAE: Prototype-aware Masked Graph Auto-Encoder for Graph Representation Learning","authors":"Yimei Zheng, Caiyan Jia","doi":"10.1145/3649143","DOIUrl":"https://doi.org/10.1145/3649143","url":null,"abstract":"<p>Graph self-supervised representation learning has gained considerable attention and demonstrated remarkable efficacy in extracting meaningful representations from graphs, particularly in the absence of labeled data. Two representative methods in this domain are graph auto-encoding and graph contrastive learning. However, the former methods primarily focus on global structures, potentially overlooking some fine-grained information during reconstruction. The latter methods emphasize node similarity across correlated views in the embedding space, potentially neglecting the inherent global graph information in the original input space. Moreover, handling incomplete graphs in real-world scenarios, where original features are unavailable for certain nodes, poses challenges for both types of methods. To alleviate these limitations, we integrate masked graph auto-encoding and prototype-aware graph contrastive learning into a unified model to learn node representations in graphs. In our method, we begin by masking a portion of node features and utilize a specific decoding strategy to reconstruct the masked information. This process facilitates the recovery of graphs from a global or macro level and enables handling incomplete graphs easily. Moreover, we treat the masked graph and the original one as a pair of contrasting views, enforcing the alignment and uniformity between their corresponding node representations at a local or micro level. Lastly, to capture cluster structures from a meso level and learn more discriminative representations, we introduce a prototype-aware clustering consistency loss that is jointly optimized with the above two complementary objectives. Extensive experiments conducted on several datasets demonstrate that the proposed method achieves significantly better or competitive performance on downstream tasks, especially for graph clustering, compared with the state-of-the-art methods, showcasing its superiority in enhancing graph representation learning.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"48 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Origin-destination (OD) flow contains population mobility information between every two regions in the city, which is of great value in urban planning and transportation management. Nevertheless, the collection of OD flow data is extremely difficult due to the hindrance of privacy issues and collection costs. Significant efforts have been made to generate OD flow based on urban regional features, e.g. demographics, land use, etc. since spatial heterogeneity of urban function is the primary cause that drives people to move from one place to another. On the other hand, people travel through various routes between OD, which will have effects on urban traffics, e.g. road travel speed and time. These effects of OD flows reveal the fine-grained spatiotemporal patterns of population mobility. Few works have explored the effectiveness of incorporating urban traffic information into OD generation. To bridge this gap, we propose to generate real-world daily temporal OD flows enhanced by urban traffic information in this paper. Our model consists of two modules: Urban2OD and OD2Traffic. In the Urban2OD module, we devise a spatiotemporal graph neural network to model the complex dependencies between daily temporal OD flows and regional features. In the OD2Traffic module, we introduce an attention-based neural network to predict urban traffics based on OD flow from Urban2OD module. Then, by utilizing gradient backpropagation, these two modules are able to enhance each other to generate high-quality OD flow data. Extensive experiments conducted on real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art.
起点-终点(OD)流包含城市中每两个区域之间的人口流动信息,在城市规划和交通管理中具有重要价值。然而,由于隐私问题和收集成本的阻碍,OD 流量数据的收集极为困难。由于城市功能的空间异质性是促使人们从一个地方迁移到另一个地方的主要原因,因此人们已经做出了巨大努力,根据城市区域特征(如人口统计、土地利用等)生成 OD 流量。另一方面,人们在 OD 之间通过不同路线出行,这将对城市交通产生影响,如道路通行速度和时间。这些 OD 流量的影响揭示了人口流动的精细时空模式。很少有研究探讨将城市交通信息纳入 OD 生成的有效性。为了弥补这一不足,我们在本文中提出利用城市交通信息生成真实世界中的每日时空 OD 流量。我们的模型由两个模块组成:Urban2OD 和 OD2Traffic。在 Urban2OD 模块中,我们设计了一个时空图神经网络来模拟每日时间性 OD 流量与区域特征之间的复杂依赖关系。在 OD2Traffic 模块中,我们引入了基于注意力的神经网络,根据 Urban2OD 模块的 OD 流量预测城市交通。然后,通过梯度反向传播,这两个模块能够相互促进,生成高质量的 OD 流量数据。在真实世界数据集上进行的大量实验证明,我们提出的模型优于最先进的模型。
{"title":"Learning to Generate Temporal Origin-destination Flow Based on Urban Regional Features and Traffic Information","authors":"Can Rong, Zhicheng Liu, Jingtao Ding, Yong Li","doi":"10.1145/3649141","DOIUrl":"https://doi.org/10.1145/3649141","url":null,"abstract":"<p>Origin-destination (OD) flow contains population mobility information between every two regions in the city, which is of great value in urban planning and transportation management. Nevertheless, the collection of OD flow data is extremely difficult due to the hindrance of privacy issues and collection costs. Significant efforts have been made to generate OD flow based on urban regional features, e.g. demographics, land use, etc. since spatial heterogeneity of urban function is the primary cause that drives people to move from one place to another. On the other hand, people travel through various routes between OD, which will have effects on urban traffics, e.g. road travel speed and time. These effects of OD flows reveal the fine-grained spatiotemporal patterns of population mobility. Few works have explored the effectiveness of incorporating urban traffic information into OD generation. To bridge this gap, we propose to generate real-world daily temporal OD flows enhanced by urban traffic information in this paper. Our model consists of two modules: <i>Urban2OD</i> and <i>OD2Traffic</i>. In the <i>Urban2OD</i> module, we devise a spatiotemporal graph neural network to model the complex dependencies between daily temporal OD flows and regional features. In the <i>OD2Traffic</i> module, we introduce an attention-based neural network to predict urban traffics based on OD flow from <i>Urban2OD</i> module. Then, by utilizing gradient backpropagation, these two modules are able to enhance each other to generate high-quality OD flow data. Extensive experiments conducted on real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}