Pub Date : 2025-12-09DOI: 10.1109/TKDE.2025.3622154
Yuanyuan Yao;Yuhan Shi;Lu Chen;Ziquan Fang;Yunjun Gao;Leong Hou U;Yushuai Li;Tianyi Li
Multivariate time series (MTS) anomaly detection identifies abnormal patterns where each timestamp contains multiple variables. Existing MTS anomaly detection methods fall into three categories: reconstruction-based, prediction-based, and classifier-based methods. However, these methods face three key challenges: (1) Unsupervised learning methods, such as reconstruction-based and prediction-based methods, rely on error thresholds, which can lead to inaccuracies; (2) Semi-supervised methods mainly model normal dataand often underuse anomaly labels, limiting detection of subtle anomalies; (3) Supervised learning methods, such as classifier-based approaches, often fail to capture local relationships, incur high computational costs, and are constrained by the scarcity of labeled data. To address these limitations, we propose Moon, a supervised modality conversion-based multivariate time series anomaly detection framework. Moon enhances the efficiency and accuracy of anomaly detection while providing detailed anomaly analysis reports. First, Moon introduces a novel multivariate Markov Transition Field (MV-MTF) technique to convert numeric time series data into image representations, capturing relationships across variables and timestamps. Since numeric data retains unique patterns that cannot be fully captured by image conversion alone, Moon employs a Multimodal-CNN to integrate numeric and image data through a feature fusion model with parameter sharing, enhancing training efficiency. Finally, a SHAP-based anomaly explainer identifies key variables contributing to anomalies, improving interpretability. Extensive experiments on six real-world MTS datasets demonstrate that Moon outperforms six state-of-the-art methods by up to 93% in efficiency, 4% in accuracy and, 10.8% in interpretation performance.
{"title":"Moon: A Modality Conversion-Based Efficient Multivariate Time Series Anomaly Detection","authors":"Yuanyuan Yao;Yuhan Shi;Lu Chen;Ziquan Fang;Yunjun Gao;Leong Hou U;Yushuai Li;Tianyi Li","doi":"10.1109/TKDE.2025.3622154","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622154","url":null,"abstract":"Multivariate time series (MTS) anomaly detection identifies abnormal patterns where each timestamp contains multiple variables. Existing MTS anomaly detection methods fall into three categories: reconstruction-based, prediction-based, and classifier-based methods. However, these methods face three key challenges: (1) Unsupervised learning methods, such as reconstruction-based and prediction-based methods, rely on error thresholds, which can lead to inaccuracies; (2) Semi-supervised methods mainly model normal dataand often underuse anomaly labels, limiting detection of subtle anomalies; (3) Supervised learning methods, such as classifier-based approaches, often fail to capture local relationships, incur high computational costs, and are constrained by the scarcity of labeled data. To address these limitations, we propose <sc>Moon</small>, a supervised modality conversion-based multivariate time series anomaly detection framework. <sc>Moon</small> enhances the efficiency and accuracy of anomaly detection while providing detailed anomaly analysis reports. First, <sc>Moon</small> introduces a novel multivariate Markov Transition Field (MV-MTF) technique to convert numeric time series data into image representations, capturing relationships across variables and timestamps. Since numeric data retains unique patterns that cannot be fully captured by image conversion alone, <sc>Moon</small> employs a Multimodal-CNN to integrate numeric and image data through a feature fusion model with parameter sharing, enhancing training efficiency. Finally, a SHAP-based anomaly explainer identifies key variables contributing to anomalies, improving interpretability. Extensive experiments on six real-world MTS datasets demonstrate that <sc>Moon</small> outperforms six state-of-the-art methods by up to 93% in efficiency, 4% in accuracy and, 10.8% in interpretation performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"457-474"},"PeriodicalIF":10.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1109/TKDE.2025.3641213
Xinyu Zhang;Zuohan Wu;Chen Jason Zhang;Libin Zheng;Peng Cheng;Jian Yin;Cyrus Shahabi
Traffic Signal Control plays a vital role in modern traffic management. However, most existing methods focus exclusively on vehicle flow, neglecting the critical role of pedestrians, leading to suboptimal performance in intersections with mixed vehicle-pedestrian traffic. Pedestrian behavior presents unique challenges due to its irregularity and flexibility, such as non-lane-based movements and uncertain crossing directions, which cannot be modeled by existing methods. To address this limitation, we propose VPLight, a comprehensive framework designed to manage both Vehicle and Pedestrian dynamics in traffic signal control. Specifically, we first design the Pedestrian Feature Extractor to capture the spatiotemporal dynamics of pedestrian movement, offering a robust representation of their irregular patterns. Subsequently, to coordinate traffic signal control at multiple intersections, we develop a novel communication approach called V-Comm to enable effective integration among intersections. Extensive experiments show that VPLight outperforms state-of-the-art baselines with significant margins (up to +44.04%). Our results demonstrate that VPLight can remarkably address the challenges of mixed vehicle-pedestrian traffic control and enhance the overall traffic flow efficiency across the road network.
{"title":"VPLight: A Reinforcement Learning Approach for Traffic Signal Control With Pedestrian Dynamics","authors":"Xinyu Zhang;Zuohan Wu;Chen Jason Zhang;Libin Zheng;Peng Cheng;Jian Yin;Cyrus Shahabi","doi":"10.1109/TKDE.2025.3641213","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3641213","url":null,"abstract":"Traffic Signal Control plays a vital role in modern traffic management. However, most existing methods focus exclusively on vehicle flow, neglecting the critical role of pedestrians, leading to suboptimal performance in intersections with mixed vehicle-pedestrian traffic. Pedestrian behavior presents unique challenges due to its irregularity and flexibility, such as non-lane-based movements and uncertain crossing directions, which cannot be modeled by existing methods. To address this limitation, we propose VPLight, a comprehensive framework designed to manage both <underline>V</u>ehicle and <underline>P</u>edestrian dynamics in traffic signal control. Specifically, we first design the Pedestrian Feature Extractor to capture the spatiotemporal dynamics of pedestrian movement, offering a robust representation of their irregular patterns. Subsequently, to coordinate traffic signal control at multiple intersections, we develop a novel communication approach called V-Comm to enable effective integration among intersections. Extensive experiments show that VPLight outperforms state-of-the-art baselines with significant margins (up to +44.04%). Our results demonstrate that VPLight can remarkably address the challenges of mixed vehicle-pedestrian traffic control and enhance the overall traffic flow efficiency across the road network.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 3","pages":"2079-2093"},"PeriodicalIF":10.4,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146162226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar individuals. To bridge these gaps, we conduct a preliminary analysis to explore the underlying reason for individual unfairness and observe correlations between IF and similarity consistency, a concept introduced to evaluate the discrepancy in identifying similar individuals based on graph structure versus node features. Inspired by our observations, we introduce two metrics to assess individual similarity from two distinct perspectives: topology fusion and feature fusion. Building upon these metrics, we propose Similarity-aware GNNs for Individual Fairness, named SaGIF. The key insight behind SaGIF is the integration of individual similarities by independently learning similarity representations, leading to an improvement of IF in GNNs. Our experiments on several real-world datasets validate the effectiveness of our proposed metrics and SaGIF. Specifically, SaGIF consistently outperforms state-of-the-art IF methods while maintaining utility performance.
{"title":"SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding","authors":"Yuchang Zhu;Jintang Li;Huizhe Zhang;Liang Chen;Zibin Zheng","doi":"10.1109/TKDE.2025.3640731","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3640731","url":null,"abstract":"Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar individuals. To bridge these gaps, we conduct a preliminary analysis to explore the underlying reason for individual unfairness and observe correlations between IF and <italic>similarity consistency</i>, a concept introduced to evaluate the discrepancy in identifying similar individuals based on graph structure versus node features. Inspired by our observations, we introduce two metrics to assess individual similarity from two distinct perspectives: topology fusion and feature fusion. Building upon these metrics, we propose <underline>S</u>imilarity-<underline>a</u>ware <underline>G</u>NNs for <underline>I</u>ndividual <underline>F</u>airness, named <bold>SaGIF</b>. The key insight behind SaGIF is the integration of individual similarities by independently learning similarity representations, leading to an improvement of IF in GNNs. Our experiments on several real-world datasets validate the effectiveness of our proposed metrics and SaGIF. Specifically, SaGIF consistently outperforms state-of-the-art IF methods while maintaining utility performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 3","pages":"1946-1957"},"PeriodicalIF":10.4,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146162207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1109/TKDE.2025.3639413
Tianyue Ren;Zhibang Yang;Yan Ding;Xu Zhou;Kenli Li;Yunjun Gao;Keqin Li
Spatial crowdsourcing (SC) is becoming increasingly popular recently. As a critical issue in SC, task assignment currently faces challenges due to the imbalanced spatiotemporal distribution of tasks. Hence, many related studies and applications focusing on cross-platform task allocation in SC have emerged. Existing work primarily focuses on the maximization of total revenue for inner platform in cross task assignment. In this work, we formulate a SC problem called Cross Dynamic Task Assignment (CDTA) to maximize the overall utility and propose improved solutions aiming at creating a win-win situation for inner platform, task requesters, and outer workers. We first design a hybrid batch processing framework and a novel cross-platform incentive mechanism. Then, with the purpose of allocating tasks to both inner and outer workers, we present a KM-based algorithm that gets the accurate assignment result in each batch and a density-aware greedy algorithm with high efficiency. To maximize the revenue of inner platform and outer workers simultaneously, we model the competition among outer workers as a potential game that is shown to have at least one pure Nash equilibrium and develop a game-theoretic method. Additionally, a simulated annealing-based improved algorithm is proposed to avoid falling into local optima. Last but not least, since random thresholds lead to unstable results when picking tasks that are preferentially assigned to inner workers, we devise an adaptive threshold selection algorithm based on multi-armed bandit to further improve the overall utility. Extensive experiments demonstrate the effectiveness and efficiency of our proposed algorithms on both real and synthetic datasets.
{"title":"Win-Win Approaches for Cross Dynamic Task Assignment in Spatial Crowdsourcing","authors":"Tianyue Ren;Zhibang Yang;Yan Ding;Xu Zhou;Kenli Li;Yunjun Gao;Keqin Li","doi":"10.1109/TKDE.2025.3639413","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639413","url":null,"abstract":"Spatial crowdsourcing (SC) is becoming increasingly popular recently. As a critical issue in SC, task assignment currently faces challenges due to the imbalanced spatiotemporal distribution of tasks. Hence, many related studies and applications focusing on cross-platform task allocation in SC have emerged. Existing work primarily focuses on the maximization of total revenue for inner platform in cross task assignment. In this work, we formulate a SC problem called Cross Dynamic Task Assignment (CDTA) to maximize the overall utility and propose improved solutions aiming at creating a win-win situation for inner platform, task requesters, and outer workers. We first design a hybrid batch processing framework and a novel cross-platform incentive mechanism. Then, with the purpose of allocating tasks to both inner and outer workers, we present a KM-based algorithm that gets the accurate assignment result in each batch and a density-aware greedy algorithm with high efficiency. To maximize the revenue of inner platform and outer workers simultaneously, we model the competition among outer workers as a potential game that is shown to have at least one pure Nash equilibrium and develop a game-theoretic method. Additionally, a simulated annealing-based improved algorithm is proposed to avoid falling into local optima. Last but not least, since random thresholds lead to unstable results when picking tasks that are preferentially assigned to inner workers, we devise an adaptive threshold selection algorithm based on multi-armed bandit to further improve the overall utility. Extensive experiments demonstrate the effectiveness and efficiency of our proposed algorithms on both real and synthetic datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1395-1411"},"PeriodicalIF":10.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1109/TKDE.2025.3639418
Shidan Ma;Yan Ding;Xu Zhou;Peng Peng;Youhuan Li;Zhibang Yang;Kenli Li
Graph pattern queries (GPQ) over RDF graphs extend basic graph patterns to support variable-length paths (VLP), thereby enabling complex knowledge retrieval and navigation. Generally, variable-length paths describe the reachability between two vertices via a given property within a specified range. With the increasing scale of RDF graphs, it is necessary to design a partitioning method to achieve efficient distributed queries. Although many partitioning strategies have been proposed for large RDF graphs, most existing methods result in numerous inter-partition joins when processing GPQs, which impacts query performance. In this paper, we formulate a new partitioning problem, MaxLocJoin, aims to minimize inter-partition joins during distributed GPQ processing. For MaxLocJoin, we propose a partitioning framework (PIP) based on property-induced subgraphs, which consist of edges with a specific set of properties. The framework first finds a locally joinable property set using a cost-driven algorithm, LJPS, where the cost depends on the sizes of weakly connected components within its property-induced subgraphs. Subsequently, the graph is partitioned according to the weakly connected components. The framework can achieve two key objectives: first, it enables complete local processing of all variable-length path queries (eliminating inter-partition joins); second, it can minimize the number of inter-partition joins required for traditional graph pattern queries. Moreover, we identify two types of independently executable queries (IEQ): the locally joinable IEQ and the single-property IEQ. After that, a query decomposition algorithm is designed to transform all GPQ into one of them for independent execution in distributed environments. In experiments, we implement two prototype systems based on Jena and Virtuoso, and evaluate them over both real and synthetic RDF graphs. The results show that MaxLocJoin achieves performance improvements from 2.8x to 10.7x over existing methods.
{"title":"Property-Induced Partitioning for Graph Pattern Queries on Distributed RDF Systems","authors":"Shidan Ma;Yan Ding;Xu Zhou;Peng Peng;Youhuan Li;Zhibang Yang;Kenli Li","doi":"10.1109/TKDE.2025.3639418","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639418","url":null,"abstract":"Graph pattern queries (GPQ) over RDF graphs extend basic graph patterns to support variable-length paths (VLP), thereby enabling complex knowledge retrieval and navigation. Generally, variable-length paths describe the reachability between two vertices via a given property within a specified range. With the increasing scale of RDF graphs, it is necessary to design a partitioning method to achieve efficient distributed queries. Although many partitioning strategies have been proposed for large RDF graphs, most existing methods result in numerous inter-partition joins when processing GPQs, which impacts query performance. In this paper, we formulate a new partitioning problem, MaxLocJoin, aims to minimize inter-partition joins during distributed GPQ processing. For MaxLocJoin, we propose a partitioning framework (PIP) based on property-induced subgraphs, which consist of edges with a specific set of properties. The framework first finds a locally joinable property set using a cost-driven algorithm, LJPS, where the cost depends on the sizes of weakly connected components within its property-induced subgraphs. Subsequently, the graph is partitioned according to the weakly connected components. The framework can achieve two key objectives: first, it enables complete local processing of all variable-length path queries (eliminating inter-partition joins); second, it can minimize the number of inter-partition joins required for traditional graph pattern queries. Moreover, we identify two types of independently executable queries (IEQ): the locally joinable IEQ and the single-property IEQ. After that, a query decomposition algorithm is designed to transform all GPQ into one of them for independent execution in distributed environments. In experiments, we implement two prototype systems based on Jena and Virtuoso, and evaluate them over both real and synthetic RDF graphs. The results show that MaxLocJoin achieves performance improvements from 2.8x to 10.7x over existing methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1249-1263"},"PeriodicalIF":10.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Truth discovery has emerged as an effective tool to mitigate data inconsistency in crowdsensing by prioritizing data from high-quality responders. While local differential privacy (LDP) has emerged as a crucial privacy-preserving paradigm, existing studies under LDP rarely explore a worker’s participation in specific tasks for sparse scenarios, which may also reveal sensitive information such as individual preferences and behaviors. Existing LDP mechanisms, when applied to truth discovery in sparse settings, may create undesirable dense distributions, provide insufficient privacy protection, and introduce excessive noise, compromising the efficacy of subsequent non-private truth discovery. Additionally, the interplay between noise injection and truth discovery remains insufficiently explored in the current literature. To address these issues, we propose a lOcally differentially private truth diSCovery approach for spArse cRowdsensing, namely OSCAR. The main idea is to use advanced optimization techniques to reconstruct the sparse data distribution and re-formalize truth discovery by considering the statistical characteristics of injected Laplacian noise while protecting the privacy of both the tasks being completed and the corresponding sensory data. Specifically, to address the data density concerns while alleviating noise, we design a randomized response-based Bernoulli matrix factorization method BerRR. To recover the sparse structures from densified, perturbed data, we formalize a 0-1 integer programming problem and develop a sparse recovery solving method SpaIE based on implicit enumeration. We further devise a Laplacian-sensitive truth discovery method LapCRH that leverages maximum likelihood estimation to re-formalize truth discovery by measuring differences between noisy values and truths based on the statistical characteristic of Laplacian noise. Our comprehensive theoretical analysis establishes OSCAR’s privacy guarantees, utility bounds, and computational complexity. Experimental results show that OSCAR surpasses the state-of-the-arts by at least 30% in accuracy improvement.
{"title":"Locally Differentially Private Truth Discovery for Sparse Crowdsensing","authors":"Pengfei Zhang;Zhikun Zhang;Yang Cao;Xiang Cheng;Youwen Zhu;Zhiquan Liu;Ji Zhang","doi":"10.1109/TKDE.2025.3639070","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639070","url":null,"abstract":"Truth discovery has emerged as an effective tool to mitigate data inconsistency in crowdsensing by prioritizing data from high-quality responders. While local differential privacy (LDP) has emerged as a crucial privacy-preserving paradigm, existing studies under LDP rarely explore a worker’s participation in specific tasks for sparse scenarios, which may also reveal sensitive information such as individual preferences and behaviors. Existing LDP mechanisms, when applied to truth discovery in sparse settings, may create undesirable dense distributions, provide insufficient privacy protection, and introduce excessive noise, compromising the efficacy of subsequent non-private truth discovery. Additionally, the interplay between noise injection and truth discovery remains insufficiently explored in the current literature. To address these issues, we propose a l<italic>O</i>cally differentially private truth di<italic>SC</i>overy approach for sp<italic>A</i>rse c<bold>R</b>owdsensing, namely <italic>OSCAR</i>. The main idea is to use advanced optimization techniques to reconstruct the sparse data distribution and re-formalize truth discovery by considering the statistical characteristics of injected Laplacian noise while protecting the privacy of both the tasks being completed and the corresponding sensory data. Specifically, to address the data density concerns while alleviating noise, we design a randomized response-based Bernoulli matrix factorization method <italic>BerRR</i>. To recover the sparse structures from densified, perturbed data, we formalize a 0-1 integer programming problem and develop a sparse recovery solving method <italic>SpaIE</i> based on implicit enumeration. We further devise a Laplacian-sensitive truth discovery method <italic>LapCRH</i> that leverages maximum likelihood estimation to re-formalize truth discovery by measuring differences between noisy values and truths based on the statistical characteristic of Laplacian noise. Our comprehensive theoretical analysis establishes <italic>OSCAR</i>’s privacy guarantees, utility bounds, and computational complexity. Experimental results show that <italic>OSCAR</i> surpasses the state-of-the-arts by at least 30% in accuracy improvement.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1189-1205"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the proliferation of GPS-equipped edge devices, huge trajectory data are generated and accumulated in various domains, driving numerous urban applications. However, due to the limited data acquisition capabilities of edge devices, many trajectories are often recorded at low sampling rates, reducing the effectiveness of these applications. To address this issue, we aim to recover high-sample-rate trajectories from low-sample-rate ones enhancing the usability of trajectory data. Recent approaches to trajectory recovery often assume centralized data storage, which can lead to catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. This not only poses privacy risks but also degrades performance in decentralized settings where data streams into the system incrementally. To enable decentralized training and streaming trajectory recovery, we propose a Lightweight incremental framework for federated Trajectory Recovery, called LightTR+, which is based on a client-server architecture. Given the limited processing capabilities of edge devices, LightTR+ includes a lightweight local trajectory embedding module that enhances computational efficiency without compromising feature extraction capabilities. To mitigate catastrophic forgetting, we propose an intra-domain knowledge distillation module. Additionally, LightTR+ features a meta-knowledge enhanced local-global training scheme, which reduces communication costs between the server and clients, further improving efficiency. Extensive experiments offer insight into the effectiveness and efficiency of LightTR+.
{"title":"LightTR+: A Lightweight Incremental Framework for Federated Trajectory Recovery","authors":"Hao Miao;Ziqiao Liu;Yan Zhao;Chenxi Liu;Chenjuan Guo;Bin Yang;Kai Zheng;Huan Li;Christian S. Jensen","doi":"10.1109/TKDE.2025.3638888","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638888","url":null,"abstract":"With the proliferation of GPS-equipped edge devices, huge trajectory data are generated and accumulated in various domains, driving numerous urban applications. However, due to the limited data acquisition capabilities of edge devices, many trajectories are often recorded at low sampling rates, reducing the effectiveness of these applications. To address this issue, we aim to recover high-sample-rate trajectories from low-sample-rate ones enhancing the usability of trajectory data. Recent approaches to trajectory recovery often assume centralized data storage, which can lead to catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. This not only poses privacy risks but also degrades performance in decentralized settings where data streams into the system incrementally. To enable decentralized training and streaming trajectory recovery, we propose a <underline>Light</u>weight incremental framework for federated <underline>T</u>rajectory <underline>R</u>ecovery, called LightTR+, which is based on a client-server architecture. Given the limited processing capabilities of edge devices, LightTR+ includes a lightweight local trajectory embedding module that enhances computational efficiency without compromising feature extraction capabilities. To mitigate catastrophic forgetting, we propose an intra-domain knowledge distillation module. Additionally, LightTR+ features a meta-knowledge enhanced local-global training scheme, which reduces communication costs between the server and clients, further improving efficiency. Extensive experiments offer insight into the effectiveness and efficiency of LightTR+.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1174-1188"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/TKDE.2025.3639074
Xiang Wu;Rong-Hua Li;Zhaoxin Fan;Kai Chen;Yujin Gao;Hongchao Qin;Guoren Wang
Temporal interactions form the crux of numerous real-world scenarios, thus necessitating effective modeling in temporal graph representation learning. Despite extensive research within this domain, we identify a significant oversight in current methodologies: the temporal-spatial dynamics in graphs, encompassing both structural and temporal coherence, remain largely unaddressed. In an effort to bridge this research gap, we present a novel framework termed Graph Representation learning enhanced by Periodic and Community Interactions (GRPCI). GRPCI consists of two primary mechanisms devised explicitly to tackle the aforementioned challenge. Firstly, to utilize latent temporal dynamics, we propose a novel periodicity-based neighborhood aggregation mechanism that underscores neighbors engaged in a periodic interaction pattern. This mechanism seamlessly integrates the element of periodicity into the model. Secondly, to exploit structural dynamics, we design a novel contrastive-based local community representation learning mechanism. This mechanism features a heuristic dynamic contrastive pair sampling strategy aimed at enhancing the modeling of the latent distribution of local communities within the graphs. Through the incorporation of these two mechanisms, GRPCI markedly augments the performance of graph networks. Empirical evaluations, conducted via a temporal link prediction task across five real-life datasets, attest to the superior performance of GRPCI in comparison to existing state-of-the-art methodologies. The results of this study validate the efficacy of GRPCI, thereby establishing a new benchmark for future research in the field of temporal graph representation learning. Our findings underscore the importance of considering both temporal and structural consistency in temporal graph learning, and advocate for further exploration of this paradigm.
{"title":"GRPCI: Harnessing Temporal-Spatial Dynamics for Graph Representation Learning","authors":"Xiang Wu;Rong-Hua Li;Zhaoxin Fan;Kai Chen;Yujin Gao;Hongchao Qin;Guoren Wang","doi":"10.1109/TKDE.2025.3639074","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639074","url":null,"abstract":"Temporal interactions form the crux of numerous real-world scenarios, thus necessitating effective modeling in temporal graph representation learning. Despite extensive research within this domain, we identify a significant oversight in current methodologies: the temporal-spatial dynamics in graphs, encompassing both structural and temporal coherence, remain largely unaddressed. In an effort to bridge this research gap, we present a novel framework termed Graph Representation learning enhanced by Periodic and Community Interactions (GRPCI). GRPCI consists of two primary mechanisms devised explicitly to tackle the aforementioned challenge. Firstly, to utilize latent temporal dynamics, we propose a novel periodicity-based neighborhood aggregation mechanism that underscores neighbors engaged in a periodic interaction pattern. This mechanism seamlessly integrates the element of periodicity into the model. Secondly, to exploit structural dynamics, we design a novel contrastive-based local community representation learning mechanism. This mechanism features a heuristic dynamic contrastive pair sampling strategy aimed at enhancing the modeling of the latent distribution of local communities within the graphs. Through the incorporation of these two mechanisms, GRPCI markedly augments the performance of graph networks. Empirical evaluations, conducted via a temporal link prediction task across five real-life datasets, attest to the superior performance of GRPCI in comparison to existing state-of-the-art methodologies. The results of this study validate the efficacy of GRPCI, thereby establishing a new benchmark for future research in the field of temporal graph representation learning. Our findings underscore the importance of considering both temporal and structural consistency in temporal graph learning, and advocate for further exploration of this paradigm.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1144-1158"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1109/TKDE.2025.3638864
Yunxiao Zhao;Zhiqiang Wang;Xingtong Yu;Xiaoli Li;Jiye Liang;Ru Li
Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator generates the most human-intelligible parts of the input (i.e., rationales), followed by a predictor that makes predictions based on these generated rationales. Conventional rationalization methods typically impose constraints via regularization terms to calibrate or penalize undesired generation. However, these methods are suffering from a problem called mode collapse, in which the predictor produces correct predictions yet the generator consistently outputs rationales with collapsed patterns. Moreover, existing studies are typically designed separately for specific collapsed patterns, lacking a unified consideration. In this paper, we systematically revisit cooperative rationalization from a novel game-theoretic perspective and identify the fundamental cause of this problem: the generator no longer tends to explore new strategies to uncover informative rationales, ultimately leading the system to converge to a suboptimal game equilibrium (correct predictions versus collapsed rationales). To solve this problem, we then propose a novel approach, Game-theoretic Policy Optimization oriented RATionalization (PoRat), which progressively introduces policy interventions to address the game equilibrium in the cooperative game process, thereby guiding the model toward a more optimal solution state. We theoretically analyse the cause of such a suboptimal equilibrium and prove the feasibility of the proposed method. Furthermore, we validate our method on nine widely used real-world datasets and two synthetic settings, where PoRat achieves up to 8.1% performance improvements over existing state-of-the-art methods.
{"title":"Learnable Game-Theoretic Policy Optimization for Data-Centric Self-Explanation Rationalization","authors":"Yunxiao Zhao;Zhiqiang Wang;Xingtong Yu;Xiaoli Li;Jiye Liang;Ru Li","doi":"10.1109/TKDE.2025.3638864","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638864","url":null,"abstract":"Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator generates the most human-intelligible parts of the input (i.e., rationales), followed by a predictor that makes predictions based on these generated rationales. Conventional rationalization methods typically impose constraints via regularization terms to calibrate or penalize undesired generation. However, these methods are suffering from a problem called mode collapse, in which the predictor produces correct predictions yet the generator consistently outputs rationales with collapsed patterns. Moreover, existing studies are typically designed separately for specific collapsed patterns, lacking a unified consideration. In this paper, we systematically revisit cooperative rationalization from a novel game-theoretic perspective and identify the fundamental cause of this problem: the generator no longer tends to explore new strategies to uncover informative rationales, ultimately leading the system to converge to a suboptimal game equilibrium (correct predictions <italic>versus</i> collapsed rationales). To solve this problem, we then propose a novel approach, Game-theoretic <bold>P</b>olicy <bold>O</b>ptimization oriented <bold>RAT</b>ionalization (<sc>PoRat</small>), which progressively introduces policy interventions to address the game equilibrium in the cooperative game process, thereby guiding the model toward a more optimal solution state. We theoretically analyse the cause of such a suboptimal equilibrium and prove the feasibility of the proposed method. Furthermore, we validate our method on nine widely used real-world datasets and two synthetic settings, where <sc>PoRat</small> achieves up to 8.1% performance improvements over existing state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1159-1173"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1109/TKDE.2025.3638465
Fan Li;Xiaoyang Wang;Dawei Cheng;Wenjie Zhang;Chen Chen;Ying Zhang;Xuemin Lin
With growing demands for data privacy and model robustness, graph unlearning (GU), which erases the influence of specific data on trained GNN models, has gained significant attention. However, existing exact unlearning methods suffer from either low efficiency or poor model performance. While more utility-preserving and efficient, current approximate methods require access to the forget set during unlearning, which makes them inapplicable in immediate deletion scenarios, thereby undermining privacy. Additionally, these approximate methods, which attempt to directly perturb model parameters, still raise significant concerns regarding unlearning power in empirical studies. To fill the gap, we propose Transferable Condensation Graph Unlearning (TCGU), a data-centric solution to graph unlearning. Specifically, we first develop a two-level alignment strategy to pre-condense the original graph into a compact yet utility-preserving dataset for subsequent unlearning tasks. Upon receiving an unlearning request, we fine-tune the pre-condensed data with a low-rank plugin, to directly align its distribution with the remaining graph, thus efficiently revoking the information of deleted data without accessing them. A novel similarity distribution matching approach and a discrimination regularizer are proposed to effectively transfer condensed data and preserve its utility in GNN training, respectively. Finally, we retrain the GNN on the transferred condensed data. Extensive experiments on 7 benchmark datasets demonstrate that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy compared to existing GU methods. To the best of our knowledge, this is the first study to explore graph unlearning with immediate data removal using a data-centric approximate method.
{"title":"TCGU: Data-Centric Graph Unlearning Based on Transferable Condensation","authors":"Fan Li;Xiaoyang Wang;Dawei Cheng;Wenjie Zhang;Chen Chen;Ying Zhang;Xuemin Lin","doi":"10.1109/TKDE.2025.3638465","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638465","url":null,"abstract":"With growing demands for data privacy and model robustness, graph unlearning (GU), which erases the influence of specific data on trained GNN models, has gained significant attention. However, existing exact unlearning methods suffer from either low efficiency or poor model performance. While more utility-preserving and efficient, current approximate methods require access to the forget set during unlearning, which makes them inapplicable in immediate deletion scenarios, thereby undermining privacy. Additionally, these approximate methods, which attempt to directly perturb model parameters, still raise significant concerns regarding unlearning power in empirical studies. To fill the gap, we propose Transferable Condensation Graph Unlearning (TCGU), a data-centric solution to graph unlearning. Specifically, we first develop a two-level alignment strategy to pre-condense the original graph into a compact yet utility-preserving dataset for subsequent unlearning tasks. Upon receiving an unlearning request, we fine-tune the pre-condensed data with a low-rank plugin, to directly align its distribution with the remaining graph, thus efficiently revoking the information of deleted data without accessing them. A novel similarity distribution matching approach and a discrimination regularizer are proposed to effectively transfer condensed data and preserve its utility in GNN training, respectively. Finally, we retrain the GNN on the transferred condensed data. Extensive experiments on 7 benchmark datasets demonstrate that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy compared to existing GU methods. To the best of our knowledge, this is the first study to explore graph unlearning with immediate data removal using a data-centric approximate method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1334-1348"},"PeriodicalIF":10.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}