Traffic prediction is essential for modern transportation systems, enhancing traffic management and urban planning. Accurate predictions of traffic flow and speed are crucial for understanding road usage, mitigating congestion, and providing real-time traffic monitoring and dynamic route guidance, thus improving road safety and infrastructure efficiency. Traditional research has often focused on predicting traffic flow or speed independently, leading to higher resource consumption due to the need for separate models. Few studies have explored the simultaneous prediction of both metrics, with recent attempts failing to account for spatial correlations, resulting in suboptimal performance. To address these challenges, we propose MTNet, a multi-task learning framework for joint traffic flow and speed prediction. MTNet employs a Transformer-like Encoder-Decoder architecture to process and enhance feature representations, capturing complex spatio-temporal correlations. Specifically, MTNet extracts intra-task dependencies using a cross-task interaction module and models task-specific spatiotemporal dependencies using spatial and temporal-aware modules with cascaded residual structures. Additionally, spatio-temporal positional encoding is integrated to increase awareness of long-term and long-distance dependencies. Extensive experiments on three diverse traffic datasets—Manchester, PeMSD4, and PeMSD8—demonstrate that MTNet significantly outperforms state-of-the-art methods in both traffic flow and speed prediction. MTNet achieves substantial improvements in prediction accuracy and efficiency, striking an optimal balance between performance and computational resource usage.
{"title":"MTNet: A Multi-Task Learning Framework That Integrates Intra-Task and Task-Specific Dependencies for Traffic Forecasting","authors":"Shaokun Zhang;Rui Wang;Hongjun Tang;Kaizhong Zuo;Peng Jiang;Peng Hu;Wenjie Li;Biao Jie;Peize Zhao","doi":"10.1109/TKDE.2025.3638147","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638147","url":null,"abstract":"Traffic prediction is essential for modern transportation systems, enhancing traffic management and urban planning. Accurate predictions of traffic flow and speed are crucial for understanding road usage, mitigating congestion, and providing real-time traffic monitoring and dynamic route guidance, thus improving road safety and infrastructure efficiency. Traditional research has often focused on predicting traffic flow or speed independently, leading to higher resource consumption due to the need for separate models. Few studies have explored the simultaneous prediction of both metrics, with recent attempts failing to account for spatial correlations, resulting in suboptimal performance. To address these challenges, we propose MTNet, a multi-task learning framework for joint traffic flow and speed prediction. MTNet employs a Transformer-like Encoder-Decoder architecture to process and enhance feature representations, capturing complex spatio-temporal correlations. Specifically, MTNet extracts intra-task dependencies using a cross-task interaction module and models task-specific spatiotemporal dependencies using spatial and temporal-aware modules with cascaded residual structures. Additionally, spatio-temporal positional encoding is integrated to increase awareness of long-term and long-distance dependencies. Extensive experiments on three diverse traffic datasets—Manchester, PeMSD4, and PeMSD8—demonstrate that MTNet significantly outperforms state-of-the-art methods in both traffic flow and speed prediction. MTNet achieves substantial improvements in prediction accuracy and efficiency, striking an optimal balance between performance and computational resource usage.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1206-1220"},"PeriodicalIF":10.4,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1109/TKDE.2025.3634839
Zhengwei Tao;Xiancai Chen;Zhi Jin;Xiaoying Bai;Haiyan Zhao;Wenpeng Hu;Chongyang Tao;Shuai Ma
Event reasoning is to reason with events and certain inter-event relations. These cutting-edge techniques possess crucial and fundamental capabilities that underlie various applications. Large language models (LLMs) have made advances in event reasoning owing to their wealth of training. However, the LLMs commonly used today still do not consistently demonstrate proficiency in managing event reasoning as humans. This discrepancy arises from not explicitly modeling events and their relations and insufficient knowledge of event relations. In addition, the different reasoning paradigms of the LLMs are trained in an imbalanced way. In this paper, we propose $textsc {WizardEvent}$, to synthesize data from the unlabeled corpus with the proposed hybrid event-aware instruction tuning. Specifically, we first represent the events and their relation in a novel structure and then extract the knowledge from the raw text. Second, we introduce hybrid event reasoning paradigms with four reasoning formats. Lastly, we wrap our constructed event relational knowledge with the paradigms to create the instruction tuning dataset. We fine-tune the model with this enriched dataset, significantly improving the event reasoning. The performance of $textsc {WizardEvent}$ is rigorously evaluated through extensive experiments. The results demonstrate that $textsc {WizardEvent}$ substantially outperforms baselines, indicating the effectiveness of our approach.
{"title":"WizardEvent: Empowering Event Reasoning by Hybrid Event-Aware Data Synthesizing","authors":"Zhengwei Tao;Xiancai Chen;Zhi Jin;Xiaoying Bai;Haiyan Zhao;Wenpeng Hu;Chongyang Tao;Shuai Ma","doi":"10.1109/TKDE.2025.3634839","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3634839","url":null,"abstract":"Event reasoning is to reason with events and certain inter-event relations. These cutting-edge techniques possess crucial and fundamental capabilities that underlie various applications. Large language models (LLMs) have made advances in event reasoning owing to their wealth of training. However, the LLMs commonly used today still do not consistently demonstrate proficiency in managing event reasoning as humans. This discrepancy arises from not explicitly modeling events and their relations and insufficient knowledge of event relations. In addition, the different reasoning paradigms of the LLMs are trained in an imbalanced way. In this paper, we propose <inline-formula><tex-math>$textsc {WizardEvent}$</tex-math></inline-formula>, to synthesize data from the unlabeled corpus with the proposed hybrid event-aware instruction tuning. Specifically, we first represent the events and their relation in a novel structure and then extract the knowledge from the raw text. Second, we introduce hybrid event reasoning paradigms with four reasoning formats. Lastly, we wrap our constructed event relational knowledge with the paradigms to create the instruction tuning dataset. We fine-tune the model with this enriched dataset, significantly improving the event reasoning. The performance of <inline-formula><tex-math>$textsc {WizardEvent}$</tex-math></inline-formula> is rigorously evaluated through extensive experiments. The results demonstrate that <inline-formula><tex-math>$textsc {WizardEvent}$</tex-math></inline-formula> substantially outperforms baselines, indicating the effectiveness of our approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1412-1426"},"PeriodicalIF":10.4,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale social networks can be modeled as decentralized graphs, where each node holds a part of the overall network. Local differential privacy (LDP) has been widely adopted in decentralized graph analysis to ensure privacy for individual nodes. However, existing LDP-based methods often fail to accommodate personalized privacy requirements due to their uniform encoding and equal perturbation mechanisms. To address this issue, we propose PEGS, a novel privacy-preserving decentralized graph synthesis approach that significantly improves utility while respecting user-specific privacy preferences. Specifically, we introduce interactive local differential privacy (iLDP), a new edge-level definition of LDP that relaxes the constraints of node-independent perturbation, thereby enabling the fulfillment of individual privacy needs. Furthermore, we develop a decentralized graph perturbation framework offering three levels of privacy settings. To optimize the balance between information preservation and privacy, we design encoding and perturbation mechanisms leveraging information entropy tailored to different privacy levels. Extensive experimental evaluations and rigorous theoretical analysis demonstrate that our method produces high-quality synthetic graphs while adhering to iLDP guarantees.
{"title":"PEGS: A Graph Synthesis Approach Based on Local Differential Privacy Preference","authors":"Lihe Hou;Weiwei Ni;Nan Fu;Dongyue Zhang;Ruyu Zhang","doi":"10.1109/TKDE.2025.3637324","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3637324","url":null,"abstract":"Large-scale social networks can be modeled as decentralized graphs, where each node holds a part of the overall network. Local differential privacy (LDP) has been widely adopted in decentralized graph analysis to ensure privacy for individual nodes. However, existing LDP-based methods often fail to accommodate personalized privacy requirements due to their uniform encoding and equal perturbation mechanisms. To address this issue, we propose PEGS, a novel privacy-preserving decentralized graph synthesis approach that significantly improves utility while respecting user-specific privacy preferences. Specifically, we introduce interactive local differential privacy (iLDP), a new edge-level definition of LDP that relaxes the constraints of node-independent perturbation, thereby enabling the fulfillment of individual privacy needs. Furthermore, we develop a decentralized graph perturbation framework offering three levels of privacy settings. To optimize the balance between information preservation and privacy, we design encoding and perturbation mechanisms leveraging information entropy tailored to different privacy levels. Extensive experimental evaluations and rigorous theoretical analysis demonstrate that our method produces high-quality synthetic graphs while adhering to iLDP guarantees.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1236-1248"},"PeriodicalIF":10.4,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Utilizing pre-trained generative models for sentiment element extraction has recently significantly enhanced aspect-based sentiment analysis benchmarks. Nonetheless, these models have two significant drawbacks: 1) high-computational cost in both the inference time and hardware requirement. 2) Lack of explicit modeling as they model the connections between sentiment elements with fragile natural or notational language target sequence. To overcome these challenges, we present a novel opinion tree parsing model designed to swiftly parse sentiment elements from an opinion tree. This approach not only accelerates the process but also explicitly unveils a more comprehensive and fully articulated aspect-level sentiment structure. Our method begins by introducing a pioneering context-free opinion grammar to standardize the opinion tree structure. Subsequently, we leverage a neural chart-based opinion tree parser to thoroughly explore the interconnections among sentiment elements and parse them into a structured opinion tree. Extensive experiments underscore the effectiveness of our proposed model and the capability of the opinion tree parser, particularly when coupled with the introduced context-free opinion grammar. Crucially, the results confirm the superior speed of our model compared to the SOTA baselines.
{"title":"Exploring Context-Free Opinion Grammar for Aspect-Based Sentiment Analysis","authors":"Xiaoyi Bao;Jinghang Gu;Zhongqing Wang;Xiaotong Jiang;Chu-Ren Huang","doi":"10.1109/TKDE.2025.3632628","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3632628","url":null,"abstract":"Utilizing pre-trained generative models for sentiment element extraction has recently significantly enhanced aspect-based sentiment analysis benchmarks. Nonetheless, these models have two significant drawbacks: 1) high-computational cost in both the inference time and hardware requirement. 2) Lack of explicit modeling as they model the connections between sentiment elements with fragile natural or notational language target sequence. To overcome these challenges, we present a novel opinion tree parsing model designed to swiftly parse sentiment elements from an opinion tree. This approach not only accelerates the process but also explicitly unveils a more comprehensive and fully articulated aspect-level sentiment structure. Our method begins by introducing a pioneering context-free opinion grammar to standardize the opinion tree structure. Subsequently, we leverage a neural chart-based opinion tree parser to thoroughly explore the interconnections among sentiment elements and parse them into a structured opinion tree. Extensive experiments underscore the effectiveness of our proposed model and the capability of the opinion tree parser, particularly when coupled with the introduced context-free opinion grammar. Crucially, the results confirm the superior speed of our model compared to the SOTA baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1070-1083"},"PeriodicalIF":10.4,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. Our framework, tested on the publicly available ogbn-papers100 M dataset with 111 million nodes and 1.6 billion edges, achieves state-of-the-art performance, showcasing scalability and efficiency. We have deployed our framework on Tencent’s online game data, confirming its capability to pre-train on real-world graphs with over 540 million nodes and 12 billion edges and to generalize effectively across diverse static and dynamic downstream tasks.
{"title":"Generalizing Graph Transformers Across Diverse Graphs and Tasks via Pre-Training","authors":"Yufei He;Zhenyu Hou;Yukuo Cen;Jun Hu;Feng He;Xu Cheng;Jie Tang;Bryan Hooi","doi":"10.1109/TKDE.2025.3632394","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3632394","url":null,"abstract":"Graph pre-training has been concentrated on graph-level tasks involving small graphs (e.g., molecular graphs) or learning node representations on a fixed graph. Extending graph pre-trained models to web-scale graphs with billions of nodes in industrial scenarios, while avoiding negative transfer across graphs or tasks, remains a challenge. We aim to develop a general graph pre-trained model with inductive ability that can make predictions for unseen new nodes and even new graphs. In this work, we introduce a scalable transformer-based graph pre-training framework called PGT (Pre-trained Graph Transformer). Based on the masked autoencoder architecture, we design two pre-training tasks: one for reconstructing node features and the other for reconstructing local structures. Unlike the original autoencoder architecture where the pre-trained decoder is discarded, we propose a novel strategy that utilizes the decoder for feature augmentation. Our framework, tested on the publicly available ogbn-papers100 M dataset with 111 million nodes and 1.6 billion edges, achieves state-of-the-art performance, showcasing scalability and efficiency. We have deployed our framework on Tencent’s online game data, confirming its capability to pre-train on real-world graphs with over 540 million nodes and 12 billion edges and to generalize effectively across diverse static and dynamic downstream tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1114-1128"},"PeriodicalIF":10.4,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time series forecasting faces significant challenges due to non-stationary components that obscure underlying patterns. While Transformer-based models are effective at capturing stationary components, they struggle with non-stationary dynamics and multivariate dependencies. In this paper, we propose FreqEvo, a lightweight Frequency Domain Feature Enhancement module for time series forecasting. FreqEvo progressively filters frequency components from high to low amplitude, ensuring the preservation of informative features while reducing noise. By integrating recursive Fourier-based residual modeling and cross-domain attention, FreqEvo effectively refines low-amplitude frequency features and stabilizes the embeddings, outperforming traditional low-pass filtering and random frequency selection methods in capturing both short-term and long-term dependencies. Experimental results on benchmark datasets demonstrate that FreqEvo outperforms state-of-the-art (SOTA) models and serves as a plug-and-play module to enhance existing Long-Term Sequence Forecasting (LSTF) models.
{"title":"FreqEvo: Enhancing Time Series Forecasting With Multi-Level Frequency Domain Feature Extraction","authors":"Guohong Wang;Xianhan Tan;Zengming Lin;Binli Luo;Shangjian Zhong;Kele Xu","doi":"10.1109/TKDE.2025.3632365","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3632365","url":null,"abstract":"Time series forecasting faces significant challenges due to non-stationary components that obscure underlying patterns. While Transformer-based models are effective at capturing stationary components, they struggle with non-stationary dynamics and multivariate dependencies. In this paper, we propose <italic>FreqEvo</i>, a lightweight Frequency Domain Feature Enhancement module for time series forecasting. <italic>FreqEvo</i> progressively filters frequency components from high to low amplitude, ensuring the preservation of informative features while reducing noise. By integrating recursive Fourier-based residual modeling and cross-domain attention, <italic>FreqEvo</i> effectively refines low-amplitude frequency features and stabilizes the embeddings, outperforming traditional low-pass filtering and random frequency selection methods in capturing both short-term and long-term dependencies. Experimental results on benchmark datasets demonstrate that <italic>FreqEvo</i> outperforms state-of-the-art (SOTA) models and serves as a plug-and-play module to enhance existing Long-Term Sequence Forecasting (LSTF) models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1099-1113"},"PeriodicalIF":10.4,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1109/TKDE.2025.3631909
Yawen Li;Xiaobao Wang;Bin Wen;Di Jin;Junping Du
The COVID-19 pandemic not only triggered a global health crisis but also amplified public panic through the rapid spread of misinformation. Understanding public sentiment and identifying the causes of sudden sentiment spikes is therefore critical for ensuring accurate information dissemination and guiding effective policymaking. However, mining such causes from social media remains challenging. Tweets collected during sentiment spike periods are often short, noisy, and dominated by repetitive background topics, making it difficult for existing topic models to separate emerging issues from long-standing discussions. To address these challenges, we propose the Sentiment Variation-aware Emerging Topics Mining Model (SVETM), a probabilistic graphical framework that leverages user sentiment variation between adjacent time windows as a guiding signal to distinguish emerging topics from background content. We further reformulate inference as a maximum a posteriori (MAP) problem and develop an efficient variational inference algorithm for scalable learning. Extensive experiments on a large-scale COVID-19 Twitter dataset demonstrate that SVETM outperforms strong baselines in terms of topic coherence, interpretability, and its ability to uncover the underlying causes of sentiment spikes.
{"title":"Sentiment Variation-Aware Sentiment Spike Explanation During COVID-19 Epidemic","authors":"Yawen Li;Xiaobao Wang;Bin Wen;Di Jin;Junping Du","doi":"10.1109/TKDE.2025.3631909","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3631909","url":null,"abstract":"The COVID-19 pandemic not only triggered a global health crisis but also amplified public panic through the rapid spread of misinformation. Understanding public sentiment and identifying the causes of sudden sentiment spikes is therefore critical for ensuring accurate information dissemination and guiding effective policymaking. However, mining such causes from social media remains challenging. Tweets collected during sentiment spike periods are often short, noisy, and dominated by repetitive background topics, making it difficult for existing topic models to separate emerging issues from long-standing discussions. To address these challenges, we propose the Sentiment Variation-aware Emerging Topics Mining Model (SVETM), a probabilistic graphical framework that leverages user sentiment variation between adjacent time windows as a guiding signal to distinguish emerging topics from background content. We further reformulate inference as a maximum a posteriori (MAP) problem and develop an efficient variational inference algorithm for scalable learning. Extensive experiments on a large-scale COVID-19 Twitter dataset demonstrate that SVETM outperforms strong baselines in terms of topic coherence, interpretability, and its ability to uncover the underlying causes of sentiment spikes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1306-1318"},"PeriodicalIF":10.4,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/TKDE.2025.3632233
Runlin Lei;Haipeng Ding;Zhewei Wei
Graph Neural Networks (GNNs) have become widely popular across various applications, with their vulnerability to adversarial attacks being a key concern. Among the different types of graph attacks, Restricted Black-box Attacks (RBAs) present the most strict constraints, as attackers have limited access only to node features and graph structure. Existing RBAs rely on homophily assumptions or shift-based losses as their objectives to conduct structural perturbations, but we demonstrate that all the approaches fail on heterophilic graphs. To address this challenge, we introduce node-wise distance metrics as the objective to fundamentally quantify the quality of the graph structure after perturbations. Our theoretical results show that the proposed objective allows RBAs to effectively handle graphs beyond homophily. Leveraging this objective, we propose HetAttack, a scalable method that significantly reduces the distinguishability of nodes on the victim graph. Experiments on both synthetic and real-world graphs confirm the efficacy of HetAttack across varying levels of homophily, achieving performance comparable to split-unknown white-box attacks without prior knowledge of labels or the target model.
{"title":"Restricted Black-Box Attack on Graphs Beyond Homophily","authors":"Runlin Lei;Haipeng Ding;Zhewei Wei","doi":"10.1109/TKDE.2025.3632233","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3632233","url":null,"abstract":"Graph Neural Networks (GNNs) have become widely popular across various applications, with their vulnerability to adversarial attacks being a key concern. Among the different types of graph attacks, Restricted Black-box Attacks (RBAs) present the most strict constraints, as attackers have limited access only to node features and graph structure. Existing RBAs rely on homophily assumptions or shift-based losses as their objectives to conduct structural perturbations, but we demonstrate that all the approaches fail on heterophilic graphs. To address this challenge, we introduce node-wise distance metrics as the objective to fundamentally quantify the quality of the graph structure after perturbations. Our theoretical results show that the proposed objective allows RBAs to effectively handle graphs beyond homophily. Leveraging this objective, we propose HetAttack, a scalable method that significantly reduces the distinguishability of nodes on the victim graph. Experiments on both synthetic and real-world graphs confirm the efficacy of HetAttack across varying levels of homophily, achieving performance comparable to split-unknown white-box attacks without prior knowledge of labels or the target model.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1292-1305"},"PeriodicalIF":10.4,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-11DOI: 10.1109/TKDE.2025.3631376
He Zhang;Shuang Wang;Long Chen;Xiaoping Li;Qing Gao;Quan Z. Sheng
In the era of Big Data and generative artificial intelligence (AI), discovering the truth about various objects from different sources has become a pressing topic. Existing studies primarily focus on dependent sources with conflicting information, where sources may copy information from each other. However, real-world scenarios are often more complex, with dynamic dependence relationships among sources over time. This complexity makes it much more difficult to discover the truth. One of the key challenges centers on measuring the dynamic dependence among sources. To address this challenge, we have developed three models: $Depen_{S}imple$, $Depen_{C}omplex$, and $Depen_{D}ynamic$. These models are based on the Hidden Markov Model (HMM) and are designed to handle different types of dependencies, namely simple source dependence, complex source dependence, and dynamic source dependence. Based on the constructed models, we propose a generic framework for discovering the latent truth which are evaluated by three HMM-based methods. We conduct extensive experiments on three real-world datasets to evaluate the performance of the proposed methods, and the results demonstrate that all three methods achieve high accuracy over the state-of-the-art methods.
{"title":"Reliable Truth Discovery for Dynamic and Dependent Sources","authors":"He Zhang;Shuang Wang;Long Chen;Xiaoping Li;Qing Gao;Quan Z. Sheng","doi":"10.1109/TKDE.2025.3631376","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3631376","url":null,"abstract":"In the era of Big Data and generative artificial intelligence (AI), discovering the truth about various objects from different sources has become a pressing topic. Existing studies primarily focus on dependent sources with conflicting information, where sources may copy information from each other. However, real-world scenarios are often more complex, with dynamic dependence relationships among sources over time. This complexity makes it much more difficult to discover the truth. One of the key challenges centers on measuring the dynamic dependence among sources. To address this challenge, we have developed three models: <inline-formula><tex-math>$Depen_{S}imple$</tex-math></inline-formula>, <inline-formula><tex-math>$Depen_{C}omplex$</tex-math></inline-formula>, and <inline-formula><tex-math>$Depen_{D}ynamic$</tex-math></inline-formula>. These models are based on the Hidden Markov Model (HMM) and are designed to handle different types of dependencies, namely <i>simple source dependence</i>, <i>complex source dependence</i>, and <i>dynamic source dependence</i>. Based on the constructed models, we propose a generic framework for discovering the latent truth which are evaluated by three HMM-based methods. We conduct extensive experiments on three real-world datasets to evaluate the performance of the proposed methods, and the results demonstrate that all three methods achieve high accuracy over the state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"546-558"},"PeriodicalIF":10.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-hop Knowledge Graph Reasoning (KGR) seeks to identify accurate answers within Knowledge Graphs (KGs) via multi-step reasoning, predominantly utilizing reinforcement learning (RL) to enhance the efficiency of the reasoning process. Unlike traditional Knowledge Graph Embedding (KGE) methods, RL-based approaches offer superior interpretability. However, these methods often underperform due to two critical limitations: (1) their over-reliance on Horn rules for reasoning paths, which restricts their expressive power; and (2) inadequate utilization of reasoning states during the process. To address these issues, we propose a novel RL-based framework, RAR, which shifts focus from individual paths to subgraph structures for more robust predictions. RAR frames the retrieval of reasoning subgraphs from the KG as a Markov Decision Process (MDP) and incorporates a subgraph retriever. To efficiently explore the extensive subgraph space, we integrate multi-agent RL to enhance the retriever’s capabilities. Additionally, RAR features an advanced analyst module that meticulously examines reasoning states. These modules function iteratively: the retriever expands the subgraph, followed by the analyst module’s in-depth analysis. The insights gained are then used to inform subsequent retrieval steps. Ultimately, the predicted scores from both modules are synthesized to produce more precise posterior scores. Experimental results across multiple datasets demonstrate RAR’s efficacy, showcasing a notable improvement over existing state-of-the-art RL-based KGR methods.
{"title":"Subgraph-Centric Multi-Agent Reinforcement Learning for Multi-Hop Knowledge Graph Reasoning","authors":"Tao He;Zerui Chen;Lizi Liao;Yixin Cao;Yuanxing Liu;Wei Tang;Xun Mao;Kai Lv;Ming Liu;Bing Qin","doi":"10.1109/TKDE.2025.3631495","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3631495","url":null,"abstract":"Multi-hop Knowledge Graph Reasoning (KGR) seeks to identify accurate answers within Knowledge Graphs (KGs) via multi-step reasoning, predominantly utilizing reinforcement learning (RL) to enhance the efficiency of the reasoning process. Unlike traditional Knowledge Graph Embedding (KGE) methods, RL-based approaches offer superior interpretability. However, these methods often underperform due to two critical limitations: (1) their over-reliance on Horn rules for reasoning paths, which restricts their expressive power; and (2) inadequate utilization of reasoning states during the process. To address these issues, we propose a novel RL-based framework, RAR, which shifts focus from individual paths to subgraph structures for more robust predictions. RAR frames the retrieval of reasoning subgraphs from the KG as a Markov Decision Process (MDP) and incorporates a subgraph retriever. To efficiently explore the extensive subgraph space, we integrate multi-agent RL to enhance the retriever’s capabilities. Additionally, RAR features an advanced analyst module that meticulously examines reasoning states. These modules function iteratively: the retriever expands the subgraph, followed by the analyst module’s in-depth analysis. The insights gained are then used to inform subsequent retrieval steps. Ultimately, the predicted scores from both modules are synthesized to produce more precise posterior scores. Experimental results across multiple datasets demonstrate RAR’s efficacy, showcasing a notable improvement over existing state-of-the-art RL-based KGR methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1319-1333"},"PeriodicalIF":10.4,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}