首页 > 最新文献

IEEE Transactions on Knowledge and Data Engineering最新文献

英文 中文
Moon: A Modality Conversion-Based Efficient Multivariate Time Series Anomaly Detection 基于模态转换的高效多元时间序列异常检测
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 DOI: 10.1109/TKDE.2025.3622154
Yuanyuan Yao;Yuhan Shi;Lu Chen;Ziquan Fang;Yunjun Gao;Leong Hou U;Yushuai Li;Tianyi Li
Multivariate time series (MTS) anomaly detection identifies abnormal patterns where each timestamp contains multiple variables. Existing MTS anomaly detection methods fall into three categories: reconstruction-based, prediction-based, and classifier-based methods. However, these methods face three key challenges: (1) Unsupervised learning methods, such as reconstruction-based and prediction-based methods, rely on error thresholds, which can lead to inaccuracies; (2) Semi-supervised methods mainly model normal dataand often underuse anomaly labels, limiting detection of subtle anomalies; (3) Supervised learning methods, such as classifier-based approaches, often fail to capture local relationships, incur high computational costs, and are constrained by the scarcity of labeled data. To address these limitations, we propose Moon, a supervised modality conversion-based multivariate time series anomaly detection framework. Moon enhances the efficiency and accuracy of anomaly detection while providing detailed anomaly analysis reports. First, Moon introduces a novel multivariate Markov Transition Field (MV-MTF) technique to convert numeric time series data into image representations, capturing relationships across variables and timestamps. Since numeric data retains unique patterns that cannot be fully captured by image conversion alone, Moon employs a Multimodal-CNN to integrate numeric and image data through a feature fusion model with parameter sharing, enhancing training efficiency. Finally, a SHAP-based anomaly explainer identifies key variables contributing to anomalies, improving interpretability. Extensive experiments on six real-world MTS datasets demonstrate that Moon outperforms six state-of-the-art methods by up to 93% in efficiency, 4% in accuracy and, 10.8% in interpretation performance.
多变量时间序列(MTS)异常检测识别每个时间戳包含多个变量的异常模式。现有的MTS异常检测方法分为三类:基于重建的、基于预测的和基于分类器的方法。然而,这些方法面临着三个关键挑战:(1)无监督学习方法,如基于重建和基于预测的方法,依赖于误差阈值,这可能导致不准确;(2)半监督方法主要是对正常数据进行建模,往往没有充分利用异常标签,限制了细微异常的检测;(3)监督学习方法,如基于分类器的方法,往往无法捕获局部关系,产生高计算成本,并且受到标记数据稀缺性的限制。为了解决这些限制,我们提出了Moon,一个基于监督模态转换的多变量时间序列异常检测框架。Moon在提供详细的异常分析报告的同时,提高了异常检测的效率和准确性。首先,Moon引入了一种新的多元马尔可夫过渡场(MV-MTF)技术,将数字时间序列数据转换为图像表示,捕获变量和时间戳之间的关系。由于数字数据保留了独特的模式,仅通过图像转换无法完全捕获,Moon采用Multimodal-CNN通过具有参数共享的特征融合模型将数字和图像数据进行融合,提高了训练效率。最后,基于shap的异常解释器可以识别导致异常的关键变量,从而提高可解释性。在六个真实的MTS数据集上进行的大量实验表明,Moon的效率高达93%,准确率为4%,解释性能为10.8%,优于六种最先进的方法。
{"title":"Moon: A Modality Conversion-Based Efficient Multivariate Time Series Anomaly Detection","authors":"Yuanyuan Yao;Yuhan Shi;Lu Chen;Ziquan Fang;Yunjun Gao;Leong Hou U;Yushuai Li;Tianyi Li","doi":"10.1109/TKDE.2025.3622154","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622154","url":null,"abstract":"Multivariate time series (MTS) anomaly detection identifies abnormal patterns where each timestamp contains multiple variables. Existing MTS anomaly detection methods fall into three categories: reconstruction-based, prediction-based, and classifier-based methods. However, these methods face three key challenges: (1) Unsupervised learning methods, such as reconstruction-based and prediction-based methods, rely on error thresholds, which can lead to inaccuracies; (2) Semi-supervised methods mainly model normal dataand often underuse anomaly labels, limiting detection of subtle anomalies; (3) Supervised learning methods, such as classifier-based approaches, often fail to capture local relationships, incur high computational costs, and are constrained by the scarcity of labeled data. To address these limitations, we propose <sc>Moon</small>, a supervised modality conversion-based multivariate time series anomaly detection framework. <sc>Moon</small> enhances the efficiency and accuracy of anomaly detection while providing detailed anomaly analysis reports. First, <sc>Moon</small> introduces a novel multivariate Markov Transition Field (MV-MTF) technique to convert numeric time series data into image representations, capturing relationships across variables and timestamps. Since numeric data retains unique patterns that cannot be fully captured by image conversion alone, <sc>Moon</small> employs a Multimodal-CNN to integrate numeric and image data through a feature fusion model with parameter sharing, enhancing training efficiency. Finally, a SHAP-based anomaly explainer identifies key variables contributing to anomalies, improving interpretability. Extensive experiments on six real-world MTS datasets demonstrate that <sc>Moon</small> outperforms six state-of-the-art methods by up to 93% in efficiency, 4% in accuracy and, 10.8% in interpretation performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"457-474"},"PeriodicalIF":10.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Win-Win Approaches for Cross Dynamic Task Assignment in Spatial Crowdsourcing 空间众包中跨动态任务分配的双赢方法
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1109/TKDE.2025.3639413
Tianyue Ren;Zhibang Yang;Yan Ding;Xu Zhou;Kenli Li;Yunjun Gao;Keqin Li
Spatial crowdsourcing (SC) is becoming increasingly popular recently. As a critical issue in SC, task assignment currently faces challenges due to the imbalanced spatiotemporal distribution of tasks. Hence, many related studies and applications focusing on cross-platform task allocation in SC have emerged. Existing work primarily focuses on the maximization of total revenue for inner platform in cross task assignment. In this work, we formulate a SC problem called Cross Dynamic Task Assignment (CDTA) to maximize the overall utility and propose improved solutions aiming at creating a win-win situation for inner platform, task requesters, and outer workers. We first design a hybrid batch processing framework and a novel cross-platform incentive mechanism. Then, with the purpose of allocating tasks to both inner and outer workers, we present a KM-based algorithm that gets the accurate assignment result in each batch and a density-aware greedy algorithm with high efficiency. To maximize the revenue of inner platform and outer workers simultaneously, we model the competition among outer workers as a potential game that is shown to have at least one pure Nash equilibrium and develop a game-theoretic method. Additionally, a simulated annealing-based improved algorithm is proposed to avoid falling into local optima. Last but not least, since random thresholds lead to unstable results when picking tasks that are preferentially assigned to inner workers, we devise an adaptive threshold selection algorithm based on multi-armed bandit to further improve the overall utility. Extensive experiments demonstrate the effectiveness and efficiency of our proposed algorithms on both real and synthetic datasets.
空间众包(SC)最近变得越来越流行。任务分配是供应链中的一个关键问题,由于任务的时空分布不平衡,任务分配目前面临着挑战。因此,围绕SC中跨平台任务分配的相关研究和应用也应运而生。现有的工作主要集中在跨任务分配中实现内部平台的总收益最大化。在本工作中,我们提出了一个称为交叉动态任务分配(CDTA)的SC问题,以最大化整体效用,并提出改进的解决方案,旨在为内部平台,任务请求者和外部工作者创造双赢的局面。我们首先设计了一个混合批处理框架和一种新的跨平台激励机制。在此基础上,提出了一种基于km的每批任务分配算法和一种高效的密度感知贪心算法,以实现内外工的任务分配。为了使内部平台和外部员工的收益同时最大化,我们将外部员工之间的竞争建模为至少存在一个纯纳什均衡的潜在博弈,并开发了一种博弈论方法。此外,提出了一种基于模拟退火的改进算法,以避免陷入局部最优。最后但并非最不重要的是,由于随机阈值导致在选择优先分配给内部工人的任务时结果不稳定,我们设计了一种基于多臂强盗的自适应阈值选择算法,以进一步提高整体效用。大量的实验证明了我们提出的算法在真实和合成数据集上的有效性和效率。
{"title":"Win-Win Approaches for Cross Dynamic Task Assignment in Spatial Crowdsourcing","authors":"Tianyue Ren;Zhibang Yang;Yan Ding;Xu Zhou;Kenli Li;Yunjun Gao;Keqin Li","doi":"10.1109/TKDE.2025.3639413","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639413","url":null,"abstract":"Spatial crowdsourcing (SC) is becoming increasingly popular recently. As a critical issue in SC, task assignment currently faces challenges due to the imbalanced spatiotemporal distribution of tasks. Hence, many related studies and applications focusing on cross-platform task allocation in SC have emerged. Existing work primarily focuses on the maximization of total revenue for inner platform in cross task assignment. In this work, we formulate a SC problem called Cross Dynamic Task Assignment (CDTA) to maximize the overall utility and propose improved solutions aiming at creating a win-win situation for inner platform, task requesters, and outer workers. We first design a hybrid batch processing framework and a novel cross-platform incentive mechanism. Then, with the purpose of allocating tasks to both inner and outer workers, we present a KM-based algorithm that gets the accurate assignment result in each batch and a density-aware greedy algorithm with high efficiency. To maximize the revenue of inner platform and outer workers simultaneously, we model the competition among outer workers as a potential game that is shown to have at least one pure Nash equilibrium and develop a game-theoretic method. Additionally, a simulated annealing-based improved algorithm is proposed to avoid falling into local optima. Last but not least, since random thresholds lead to unstable results when picking tasks that are preferentially assigned to inner workers, we devise an adaptive threshold selection algorithm based on multi-armed bandit to further improve the overall utility. Extensive experiments demonstrate the effectiveness and efficiency of our proposed algorithms on both real and synthetic datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1395-1411"},"PeriodicalIF":10.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Property-Induced Partitioning for Graph Pattern Queries on Distributed RDF Systems 分布式RDF系统上图模式查询的属性诱导分区
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1109/TKDE.2025.3639418
Shidan Ma;Yan Ding;Xu Zhou;Peng Peng;Youhuan Li;Zhibang Yang;Kenli Li
Graph pattern queries (GPQ) over RDF graphs extend basic graph patterns to support variable-length paths (VLP), thereby enabling complex knowledge retrieval and navigation. Generally, variable-length paths describe the reachability between two vertices via a given property within a specified range. With the increasing scale of RDF graphs, it is necessary to design a partitioning method to achieve efficient distributed queries. Although many partitioning strategies have been proposed for large RDF graphs, most existing methods result in numerous inter-partition joins when processing GPQs, which impacts query performance. In this paper, we formulate a new partitioning problem, MaxLocJoin, aims to minimize inter-partition joins during distributed GPQ processing. For MaxLocJoin, we propose a partitioning framework (PIP) based on property-induced subgraphs, which consist of edges with a specific set of properties. The framework first finds a locally joinable property set using a cost-driven algorithm, LJPS, where the cost depends on the sizes of weakly connected components within its property-induced subgraphs. Subsequently, the graph is partitioned according to the weakly connected components. The framework can achieve two key objectives: first, it enables complete local processing of all variable-length path queries (eliminating inter-partition joins); second, it can minimize the number of inter-partition joins required for traditional graph pattern queries. Moreover, we identify two types of independently executable queries (IEQ): the locally joinable IEQ and the single-property IEQ. After that, a query decomposition algorithm is designed to transform all GPQ into one of them for independent execution in distributed environments. In experiments, we implement two prototype systems based on Jena and Virtuoso, and evaluate them over both real and synthetic RDF graphs. The results show that MaxLocJoin achieves performance improvements from 2.8x to 10.7x over existing methods.
RDF图上的图模式查询(GPQ)扩展了基本的图模式,以支持变长路径(VLP),从而支持复杂的知识检索和导航。通常,变长路径描述了在指定范围内通过给定属性在两个顶点之间的可达性。随着RDF图规模的不断扩大,有必要设计一种分区方法来实现高效的分布式查询。尽管针对大型RDF图提出了许多分区策略,但大多数现有方法在处理gpq时会导致大量分区间连接,从而影响查询性能。在本文中,我们提出了一个新的分区问题MaxLocJoin,目的是在分布式GPQ处理过程中最小化分区间连接。对于MaxLocJoin,我们提出了一个基于属性诱导子图的分区框架(PIP),该子图由具有特定属性集的边组成。该框架首先使用成本驱动算法LJPS找到一个局部可连接的属性集,其中成本取决于其属性诱导子图中弱连接组件的大小。然后,根据弱连通分量对图进行划分。该框架可以实现两个关键目标:首先,它支持所有变长路径查询的完整本地处理(消除分区间连接);其次,它可以最小化传统图模式查询所需的分区间连接的数量。此外,我们还确定了两种类型的独立可执行查询(IEQ):局部可连接的IEQ和单属性的IEQ。然后,设计查询分解算法,将所有GPQ转换为其中一个,以便在分布式环境中独立执行。在实验中,我们实现了两个基于Jena和Virtuoso的原型系统,并在真实和合成RDF图上对它们进行了评估。结果表明,与现有方法相比,MaxLocJoin实现了从2.8倍到10.7倍的性能改进。
{"title":"Property-Induced Partitioning for Graph Pattern Queries on Distributed RDF Systems","authors":"Shidan Ma;Yan Ding;Xu Zhou;Peng Peng;Youhuan Li;Zhibang Yang;Kenli Li","doi":"10.1109/TKDE.2025.3639418","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639418","url":null,"abstract":"Graph pattern queries (GPQ) over RDF graphs extend basic graph patterns to support variable-length paths (VLP), thereby enabling complex knowledge retrieval and navigation. Generally, variable-length paths describe the reachability between two vertices via a given property within a specified range. With the increasing scale of RDF graphs, it is necessary to design a partitioning method to achieve efficient distributed queries. Although many partitioning strategies have been proposed for large RDF graphs, most existing methods result in numerous inter-partition joins when processing GPQs, which impacts query performance. In this paper, we formulate a new partitioning problem, MaxLocJoin, aims to minimize inter-partition joins during distributed GPQ processing. For MaxLocJoin, we propose a partitioning framework (PIP) based on property-induced subgraphs, which consist of edges with a specific set of properties. The framework first finds a locally joinable property set using a cost-driven algorithm, LJPS, where the cost depends on the sizes of weakly connected components within its property-induced subgraphs. Subsequently, the graph is partitioned according to the weakly connected components. The framework can achieve two key objectives: first, it enables complete local processing of all variable-length path queries (eliminating inter-partition joins); second, it can minimize the number of inter-partition joins required for traditional graph pattern queries. Moreover, we identify two types of independently executable queries (IEQ): the locally joinable IEQ and the single-property IEQ. After that, a query decomposition algorithm is designed to transform all GPQ into one of them for independent execution in distributed environments. In experiments, we implement two prototype systems based on Jena and Virtuoso, and evaluate them over both real and synthetic RDF graphs. The results show that MaxLocJoin achieves performance improvements from 2.8x to 10.7x over existing methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1249-1263"},"PeriodicalIF":10.4,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locally Differentially Private Truth Discovery for Sparse Crowdsensing 稀疏众感知的局部差分私有真值发现
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1109/TKDE.2025.3639070
Pengfei Zhang;Zhikun Zhang;Yang Cao;Xiang Cheng;Youwen Zhu;Zhiquan Liu;Ji Zhang
Truth discovery has emerged as an effective tool to mitigate data inconsistency in crowdsensing by prioritizing data from high-quality responders. While local differential privacy (LDP) has emerged as a crucial privacy-preserving paradigm, existing studies under LDP rarely explore a worker’s participation in specific tasks for sparse scenarios, which may also reveal sensitive information such as individual preferences and behaviors. Existing LDP mechanisms, when applied to truth discovery in sparse settings, may create undesirable dense distributions, provide insufficient privacy protection, and introduce excessive noise, compromising the efficacy of subsequent non-private truth discovery. Additionally, the interplay between noise injection and truth discovery remains insufficiently explored in the current literature. To address these issues, we propose a lOcally differentially private truth diSCovery approach for spArse cRowdsensing, namely OSCAR. The main idea is to use advanced optimization techniques to reconstruct the sparse data distribution and re-formalize truth discovery by considering the statistical characteristics of injected Laplacian noise while protecting the privacy of both the tasks being completed and the corresponding sensory data. Specifically, to address the data density concerns while alleviating noise, we design a randomized response-based Bernoulli matrix factorization method BerRR. To recover the sparse structures from densified, perturbed data, we formalize a 0-1 integer programming problem and develop a sparse recovery solving method SpaIE based on implicit enumeration. We further devise a Laplacian-sensitive truth discovery method LapCRH that leverages maximum likelihood estimation to re-formalize truth discovery by measuring differences between noisy values and truths based on the statistical characteristic of Laplacian noise. Our comprehensive theoretical analysis establishes OSCAR’s privacy guarantees, utility bounds, and computational complexity. Experimental results show that OSCAR surpasses the state-of-the-arts by at least 30% in accuracy improvement.
真相发现已经成为一种有效的工具,可以通过优先处理来自高质量响应者的数据来缓解众测中的数据不一致。虽然局部差异隐私(LDP)已成为一种重要的隐私保护范式,但现有的LDP研究很少探讨工人在稀疏场景下参与特定任务的情况,这也可能揭示个人偏好和行为等敏感信息。当现有的LDP机制应用于稀疏环境下的真值发现时,可能会产生不理想的密集分布,提供不足的隐私保护,并引入过多的噪声,从而影响后续非私有真值发现的有效性。此外,噪声注入与真理发现之间的相互作用在目前的文献中仍然没有得到充分的探讨。为了解决这些问题,我们提出了一种用于稀疏众感的局部差分私有真相发现方法,即OSCAR。其主要思想是利用先进的优化技术,通过考虑注入拉普拉斯噪声的统计特征,重构稀疏数据分布,重新形式化真值发现,同时保护正在完成的任务和相应感官数据的隐私性。具体来说,为了在降低噪声的同时解决数据密度问题,我们设计了一种基于随机响应的伯努利矩阵分解方法BerRR。为了从密集的扰动数据中恢复稀疏结构,我们形式化了一个0-1整数规划问题,并提出了一种基于隐式枚举的稀疏恢复求解方法。我们进一步设计了一种拉普拉斯敏感真值发现方法LapCRH,该方法利用最大似然估计,根据拉普拉斯噪声的统计特征,通过测量噪声值与真值之间的差异来重新形式化真值发现。我们全面的理论分析建立了OSCAR的隐私保证、效用界限和计算复杂性。实验结果表明,OSCAR算法的精度比现有算法提高了至少30%。
{"title":"Locally Differentially Private Truth Discovery for Sparse Crowdsensing","authors":"Pengfei Zhang;Zhikun Zhang;Yang Cao;Xiang Cheng;Youwen Zhu;Zhiquan Liu;Ji Zhang","doi":"10.1109/TKDE.2025.3639070","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639070","url":null,"abstract":"Truth discovery has emerged as an effective tool to mitigate data inconsistency in crowdsensing by prioritizing data from high-quality responders. While local differential privacy (LDP) has emerged as a crucial privacy-preserving paradigm, existing studies under LDP rarely explore a worker’s participation in specific tasks for sparse scenarios, which may also reveal sensitive information such as individual preferences and behaviors. Existing LDP mechanisms, when applied to truth discovery in sparse settings, may create undesirable dense distributions, provide insufficient privacy protection, and introduce excessive noise, compromising the efficacy of subsequent non-private truth discovery. Additionally, the interplay between noise injection and truth discovery remains insufficiently explored in the current literature. To address these issues, we propose a l<italic>O</i>cally differentially private truth di<italic>SC</i>overy approach for sp<italic>A</i>rse c<bold>R</b>owdsensing, namely <italic>OSCAR</i>. The main idea is to use advanced optimization techniques to reconstruct the sparse data distribution and re-formalize truth discovery by considering the statistical characteristics of injected Laplacian noise while protecting the privacy of both the tasks being completed and the corresponding sensory data. Specifically, to address the data density concerns while alleviating noise, we design a randomized response-based Bernoulli matrix factorization method <italic>BerRR</i>. To recover the sparse structures from densified, perturbed data, we formalize a 0-1 integer programming problem and develop a sparse recovery solving method <italic>SpaIE</i> based on implicit enumeration. We further devise a Laplacian-sensitive truth discovery method <italic>LapCRH</i> that leverages maximum likelihood estimation to re-formalize truth discovery by measuring differences between noisy values and truths based on the statistical characteristic of Laplacian noise. Our comprehensive theoretical analysis establishes <italic>OSCAR</i>’s privacy guarantees, utility bounds, and computational complexity. Experimental results show that <italic>OSCAR</i> surpasses the state-of-the-arts by at least 30% in accuracy improvement.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1189-1205"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightTR+: A Lightweight Incremental Framework for Federated Trajectory Recovery LightTR+:用于联邦轨迹恢复的轻量级增量框架
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1109/TKDE.2025.3638888
Hao Miao;Ziqiao Liu;Yan Zhao;Chenxi Liu;Chenjuan Guo;Bin Yang;Kai Zheng;Huan Li;Christian S. Jensen
With the proliferation of GPS-equipped edge devices, huge trajectory data are generated and accumulated in various domains, driving numerous urban applications. However, due to the limited data acquisition capabilities of edge devices, many trajectories are often recorded at low sampling rates, reducing the effectiveness of these applications. To address this issue, we aim to recover high-sample-rate trajectories from low-sample-rate ones enhancing the usability of trajectory data. Recent approaches to trajectory recovery often assume centralized data storage, which can lead to catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. This not only poses privacy risks but also degrades performance in decentralized settings where data streams into the system incrementally. To enable decentralized training and streaming trajectory recovery, we propose a Lightweight incremental framework for federated Trajectory Recovery, called LightTR+, which is based on a client-server architecture. Given the limited processing capabilities of edge devices, LightTR+ includes a lightweight local trajectory embedding module that enhances computational efficiency without compromising feature extraction capabilities. To mitigate catastrophic forgetting, we propose an intra-domain knowledge distillation module. Additionally, LightTR+ features a meta-knowledge enhanced local-global training scheme, which reduces communication costs between the server and clients, further improving efficiency. Extensive experiments offer insight into the effectiveness and efficiency of LightTR+.
随着配备gps的边缘设备的激增,在各个领域产生和积累了大量的轨迹数据,推动了众多的城市应用。然而,由于边缘设备的数据采集能力有限,许多轨迹通常以低采样率记录,从而降低了这些应用的有效性。为了解决这个问题,我们的目标是从低采样率的轨迹中恢复高采样率的轨迹,增强轨迹数据的可用性。最近的轨迹恢复方法通常假设集中数据存储,这可能导致灾难性的遗忘,当新数据到来时,以前学过的知识被完全遗忘。这不仅会带来隐私风险,而且还会降低分散设置中的性能,因为数据会逐渐流入系统。为了实现分散的训练和流轨迹恢复,我们提出了一个轻量级的增量框架,用于联邦轨迹恢复,称为LightTR+,它基于客户机-服务器架构。考虑到边缘设备的处理能力有限,LightTR+包括一个轻量级的局部轨迹嵌入模块,可以在不影响特征提取能力的情况下提高计算效率。为了减轻灾难性遗忘,我们提出了一个域内知识蒸馏模块。此外,LightTR+具有元知识增强的局部-全局培训方案,降低了服务器和客户端之间的通信成本,进一步提高了效率。广泛的实验提供了深入了解LightTR+的有效性和效率。
{"title":"LightTR+: A Lightweight Incremental Framework for Federated Trajectory Recovery","authors":"Hao Miao;Ziqiao Liu;Yan Zhao;Chenxi Liu;Chenjuan Guo;Bin Yang;Kai Zheng;Huan Li;Christian S. Jensen","doi":"10.1109/TKDE.2025.3638888","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638888","url":null,"abstract":"With the proliferation of GPS-equipped edge devices, huge trajectory data are generated and accumulated in various domains, driving numerous urban applications. However, due to the limited data acquisition capabilities of edge devices, many trajectories are often recorded at low sampling rates, reducing the effectiveness of these applications. To address this issue, we aim to recover high-sample-rate trajectories from low-sample-rate ones enhancing the usability of trajectory data. Recent approaches to trajectory recovery often assume centralized data storage, which can lead to catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. This not only poses privacy risks but also degrades performance in decentralized settings where data streams into the system incrementally. To enable decentralized training and streaming trajectory recovery, we propose a <underline>Light</u>weight incremental framework for federated <underline>T</u>rajectory <underline>R</u>ecovery, called LightTR+, which is based on a client-server architecture. Given the limited processing capabilities of edge devices, LightTR+ includes a lightweight local trajectory embedding module that enhances computational efficiency without compromising feature extraction capabilities. To mitigate catastrophic forgetting, we propose an intra-domain knowledge distillation module. Additionally, LightTR+ features a meta-knowledge enhanced local-global training scheme, which reduces communication costs between the server and clients, further improving efficiency. Extensive experiments offer insight into the effectiveness and efficiency of LightTR+.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1174-1188"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRPCI: Harnessing Temporal-Spatial Dynamics for Graph Representation Learning GRPCI:利用时间-空间动态图表示学习
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1109/TKDE.2025.3639074
Xiang Wu;Rong-Hua Li;Zhaoxin Fan;Kai Chen;Yujin Gao;Hongchao Qin;Guoren Wang
Temporal interactions form the crux of numerous real-world scenarios, thus necessitating effective modeling in temporal graph representation learning. Despite extensive research within this domain, we identify a significant oversight in current methodologies: the temporal-spatial dynamics in graphs, encompassing both structural and temporal coherence, remain largely unaddressed. In an effort to bridge this research gap, we present a novel framework termed Graph Representation learning enhanced by Periodic and Community Interactions (GRPCI). GRPCI consists of two primary mechanisms devised explicitly to tackle the aforementioned challenge. Firstly, to utilize latent temporal dynamics, we propose a novel periodicity-based neighborhood aggregation mechanism that underscores neighbors engaged in a periodic interaction pattern. This mechanism seamlessly integrates the element of periodicity into the model. Secondly, to exploit structural dynamics, we design a novel contrastive-based local community representation learning mechanism. This mechanism features a heuristic dynamic contrastive pair sampling strategy aimed at enhancing the modeling of the latent distribution of local communities within the graphs. Through the incorporation of these two mechanisms, GRPCI markedly augments the performance of graph networks. Empirical evaluations, conducted via a temporal link prediction task across five real-life datasets, attest to the superior performance of GRPCI in comparison to existing state-of-the-art methodologies. The results of this study validate the efficacy of GRPCI, thereby establishing a new benchmark for future research in the field of temporal graph representation learning. Our findings underscore the importance of considering both temporal and structural consistency in temporal graph learning, and advocate for further exploration of this paradigm.
时间交互构成了许多现实世界场景的关键,因此需要在时间图表示学习中进行有效的建模。尽管在这一领域进行了广泛的研究,但我们发现了当前方法中的一个重大疏忽:图表中的时空动态,包括结构和时间一致性,在很大程度上仍未得到解决。为了弥补这一研究差距,我们提出了一个新的框架,称为周期性和社区交互增强的图表示学习(GRPCI)。GRPCI由两个主要机制组成,旨在明确解决上述挑战。首先,为了利用潜在的时间动力学,我们提出了一种新的基于周期性的邻居聚集机制,该机制强调了参与周期性相互作用模式的邻居。这种机制无缝地将周期性元素集成到模型中。其次,利用结构动力学,我们设计了一种新的基于对比的本地社区表征学习机制。该机制具有启发式动态对比对抽样策略,旨在增强图中局部社区潜在分布的建模。通过结合这两种机制,GRPCI显著提高了图网络的性能。通过五个真实数据集的时间链接预测任务进行的实证评估证明,与现有的最先进的方法相比,GRPCI具有优越的性能。本研究的结果验证了GRPCI的有效性,从而为未来在时间图表示学习领域的研究建立了新的基准。我们的研究结果强调了在时间图学习中同时考虑时间和结构一致性的重要性,并提倡进一步探索这一范式。
{"title":"GRPCI: Harnessing Temporal-Spatial Dynamics for Graph Representation Learning","authors":"Xiang Wu;Rong-Hua Li;Zhaoxin Fan;Kai Chen;Yujin Gao;Hongchao Qin;Guoren Wang","doi":"10.1109/TKDE.2025.3639074","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3639074","url":null,"abstract":"Temporal interactions form the crux of numerous real-world scenarios, thus necessitating effective modeling in temporal graph representation learning. Despite extensive research within this domain, we identify a significant oversight in current methodologies: the temporal-spatial dynamics in graphs, encompassing both structural and temporal coherence, remain largely unaddressed. In an effort to bridge this research gap, we present a novel framework termed Graph Representation learning enhanced by Periodic and Community Interactions (GRPCI). GRPCI consists of two primary mechanisms devised explicitly to tackle the aforementioned challenge. Firstly, to utilize latent temporal dynamics, we propose a novel periodicity-based neighborhood aggregation mechanism that underscores neighbors engaged in a periodic interaction pattern. This mechanism seamlessly integrates the element of periodicity into the model. Secondly, to exploit structural dynamics, we design a novel contrastive-based local community representation learning mechanism. This mechanism features a heuristic dynamic contrastive pair sampling strategy aimed at enhancing the modeling of the latent distribution of local communities within the graphs. Through the incorporation of these two mechanisms, GRPCI markedly augments the performance of graph networks. Empirical evaluations, conducted via a temporal link prediction task across five real-life datasets, attest to the superior performance of GRPCI in comparison to existing state-of-the-art methodologies. The results of this study validate the efficacy of GRPCI, thereby establishing a new benchmark for future research in the field of temporal graph representation learning. Our findings underscore the importance of considering both temporal and structural consistency in temporal graph learning, and advocate for further exploration of this paradigm.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1144-1158"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learnable Game-Theoretic Policy Optimization for Data-Centric Self-Explanation Rationalization 以数据为中心的自我解释合理化的可学习博弈论策略优化
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1109/TKDE.2025.3638864
Yunxiao Zhao;Zhiqiang Wang;Xingtong Yu;Xiaoli Li;Jiye Liang;Ru Li
Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator generates the most human-intelligible parts of the input (i.e., rationales), followed by a predictor that makes predictions based on these generated rationales. Conventional rationalization methods typically impose constraints via regularization terms to calibrate or penalize undesired generation. However, these methods are suffering from a problem called mode collapse, in which the predictor produces correct predictions yet the generator consistently outputs rationales with collapsed patterns. Moreover, existing studies are typically designed separately for specific collapsed patterns, lacking a unified consideration. In this paper, we systematically revisit cooperative rationalization from a novel game-theoretic perspective and identify the fundamental cause of this problem: the generator no longer tends to explore new strategies to uncover informative rationales, ultimately leading the system to converge to a suboptimal game equilibrium (correct predictions versus collapsed rationales). To solve this problem, we then propose a novel approach, Game-theoretic Policy Optimization oriented RATionalization (PoRat), which progressively introduces policy interventions to address the game equilibrium in the cooperative game process, thereby guiding the model toward a more optimal solution state. We theoretically analyse the cause of such a suboptimal equilibrium and prove the feasibility of the proposed method. Furthermore, we validate our method on nine widely used real-world datasets and two synthetic settings, where PoRat achieves up to 8.1% performance improvements over existing state-of-the-art methods.
Rationalization是一个以数据为中心的框架,旨在通过生成人类可理解的输入数据片段子集来构建自解释模型来解释预测结果。它涉及一个合作游戏模型,其中生成器生成输入中最容易被人类理解的部分(即基本原理),然后是基于这些生成的基本原理做出预测的预测器。传统的合理化方法通常通过正则化条款施加约束,以校准或惩罚不期望的生成。然而,这些方法都有一个称为模式崩溃的问题,即预测器产生正确的预测,而生成器始终输出具有崩溃模式的基本原理。此外,现有的研究通常是针对特定的坍塌模式单独设计的,缺乏统一的考虑。在本文中,我们从一个新的博弈论角度系统地重新审视了合作合理化,并确定了这个问题的根本原因:生成器不再倾向于探索新的策略来发现信息的基本原理,最终导致系统收敛到次优博弈均衡(正确的预测与崩溃的基本原理)。为了解决这一问题,我们提出了一种新的方法——博弈论政策优化导向合理化(PoRat),该方法逐步引入政策干预来解决合作博弈过程中的博弈均衡,从而引导模型走向更优的解状态。从理论上分析了这种次优均衡产生的原因,并证明了所提方法的可行性。此外,我们在9个广泛使用的真实世界数据集和2个合成设置中验证了我们的方法,其中PoRat比现有的最先进方法的性能提高了8.1%。
{"title":"Learnable Game-Theoretic Policy Optimization for Data-Centric Self-Explanation Rationalization","authors":"Yunxiao Zhao;Zhiqiang Wang;Xingtong Yu;Xiaoli Li;Jiye Liang;Ru Li","doi":"10.1109/TKDE.2025.3638864","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638864","url":null,"abstract":"Rationalization, a data-centric framework, aims to build self-explanatory models to explain the prediction outcome by generating a subset of human-intelligible pieces of the input data. It involves a cooperative game model where a generator generates the most human-intelligible parts of the input (i.e., rationales), followed by a predictor that makes predictions based on these generated rationales. Conventional rationalization methods typically impose constraints via regularization terms to calibrate or penalize undesired generation. However, these methods are suffering from a problem called mode collapse, in which the predictor produces correct predictions yet the generator consistently outputs rationales with collapsed patterns. Moreover, existing studies are typically designed separately for specific collapsed patterns, lacking a unified consideration. In this paper, we systematically revisit cooperative rationalization from a novel game-theoretic perspective and identify the fundamental cause of this problem: the generator no longer tends to explore new strategies to uncover informative rationales, ultimately leading the system to converge to a suboptimal game equilibrium (correct predictions <italic>versus</i> collapsed rationales). To solve this problem, we then propose a novel approach, Game-theoretic <bold>P</b>olicy <bold>O</b>ptimization oriented <bold>RAT</b>ionalization (<sc>PoRat</small>), which progressively introduces policy interventions to address the game equilibrium in the cooperative game process, thereby guiding the model toward a more optimal solution state. We theoretically analyse the cause of such a suboptimal equilibrium and prove the feasibility of the proposed method. Furthermore, we validate our method on nine widely used real-world datasets and two synthetic settings, where <sc>PoRat</small> achieves up to 8.1% performance improvements over existing state-of-the-art methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1159-1173"},"PeriodicalIF":10.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCGU: Data-Centric Graph Unlearning Based on Transferable Condensation TCGU:基于可转移凝聚的以数据为中心的图学习
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-28 DOI: 10.1109/TKDE.2025.3638465
Fan Li;Xiaoyang Wang;Dawei Cheng;Wenjie Zhang;Chen Chen;Ying Zhang;Xuemin Lin
With growing demands for data privacy and model robustness, graph unlearning (GU), which erases the influence of specific data on trained GNN models, has gained significant attention. However, existing exact unlearning methods suffer from either low efficiency or poor model performance. While more utility-preserving and efficient, current approximate methods require access to the forget set during unlearning, which makes them inapplicable in immediate deletion scenarios, thereby undermining privacy. Additionally, these approximate methods, which attempt to directly perturb model parameters, still raise significant concerns regarding unlearning power in empirical studies. To fill the gap, we propose Transferable Condensation Graph Unlearning (TCGU), a data-centric solution to graph unlearning. Specifically, we first develop a two-level alignment strategy to pre-condense the original graph into a compact yet utility-preserving dataset for subsequent unlearning tasks. Upon receiving an unlearning request, we fine-tune the pre-condensed data with a low-rank plugin, to directly align its distribution with the remaining graph, thus efficiently revoking the information of deleted data without accessing them. A novel similarity distribution matching approach and a discrimination regularizer are proposed to effectively transfer condensed data and preserve its utility in GNN training, respectively. Finally, we retrain the GNN on the transferred condensed data. Extensive experiments on 7 benchmark datasets demonstrate that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy compared to existing GU methods. To the best of our knowledge, this is the first study to explore graph unlearning with immediate data removal using a data-centric approximate method.
随着人们对数据隐私和模型鲁棒性的需求日益增长,消除特定数据对训练好的GNN模型影响的图学习(GU)得到了广泛关注。然而,现有的精确学习方法存在效率低或模型性能差的问题。虽然目前的近似方法更加实用和高效,但需要在忘记学习期间访问忘记集,这使得它们不适用于立即删除的场景,从而破坏了隐私。此外,这些近似方法试图直接扰动模型参数,在实证研究中仍然引起了对遗忘能力的重大关注。为了填补这一空白,我们提出了可转移凝聚图学习(TCGU),一种以数据为中心的图学习解决方案。具体来说,我们首先开发了一个两级对齐策略,将原始图预先压缩成一个紧凑但保留效用的数据集,用于后续的去学习任务。在收到取消学习请求时,我们使用低秩插件对预压缩数据进行微调,使其分布直接与剩余图对齐,从而有效地撤销删除数据的信息而无需访问它们。提出了一种新的相似性分布匹配方法和一种判别正则器,分别有效地传输压缩数据和保持其在GNN训练中的效用。最后,我们在传输的压缩数据上重新训练GNN。在7个基准数据集上的大量实验表明,TCGU在模型效用、学习效率和学习效果方面都优于现有的GU方法。据我们所知,这是第一个探索使用以数据为中心的近似方法立即删除数据的图学习的研究。
{"title":"TCGU: Data-Centric Graph Unlearning Based on Transferable Condensation","authors":"Fan Li;Xiaoyang Wang;Dawei Cheng;Wenjie Zhang;Chen Chen;Ying Zhang;Xuemin Lin","doi":"10.1109/TKDE.2025.3638465","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638465","url":null,"abstract":"With growing demands for data privacy and model robustness, graph unlearning (GU), which erases the influence of specific data on trained GNN models, has gained significant attention. However, existing exact unlearning methods suffer from either low efficiency or poor model performance. While more utility-preserving and efficient, current approximate methods require access to the forget set during unlearning, which makes them inapplicable in immediate deletion scenarios, thereby undermining privacy. Additionally, these approximate methods, which attempt to directly perturb model parameters, still raise significant concerns regarding unlearning power in empirical studies. To fill the gap, we propose Transferable Condensation Graph Unlearning (TCGU), a data-centric solution to graph unlearning. Specifically, we first develop a two-level alignment strategy to pre-condense the original graph into a compact yet utility-preserving dataset for subsequent unlearning tasks. Upon receiving an unlearning request, we fine-tune the pre-condensed data with a low-rank plugin, to directly align its distribution with the remaining graph, thus efficiently revoking the information of deleted data without accessing them. A novel similarity distribution matching approach and a discrimination regularizer are proposed to effectively transfer condensed data and preserve its utility in GNN training, respectively. Finally, we retrain the GNN on the transferred condensed data. Extensive experiments on 7 benchmark datasets demonstrate that TCGU can achieve superior performance in terms of model utility, unlearning efficiency, and unlearning efficacy compared to existing GU methods. To the best of our knowledge, this is the first study to explore graph unlearning with immediate data removal using a data-centric approximate method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1334-1348"},"PeriodicalIF":10.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-Structured Driven Dual Adaptation for Mitigating Popularity Bias 图结构驱动的双重自适应缓解流行偏差
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-28 DOI: 10.1109/TKDE.2025.3638343
Miaomiao Cai;Lei Chen;Yifan Wang;Zhiyong Cheng;Min Zhang;Meng Wang
Popularity bias is a common challenge in recommender systems. It often causes unbalanced item recommendation performance and intensifies the Matthew effect. Due to limited user-item interactions, unpopular items are frequently constrained to the embedding neighborhoods of only a few users, leading to representation collapse and weakening the model’s generalization. Although existing supervised alignment and reweighting methods can help mitigate this problem, they still face two major limitations: (1) they overlook the inherent variability among different Graph Convolutional Networks (GCNs) layers, which can result in negative gains in deeper layers; (2) they rely heavily on fixed hyperparameters to balance popular and unpopular items, limiting adaptability to diverse data distributions and increasing model complexity. To address these challenges, we propose Graph-Structured Dual Adaptation Framework (GSDA), a dual adaptive framework for mitigating popularity bias in recommendation. Our theoretical analysis shows that supervised alignment in GCNs is hindered by the over-smoothing effect, where the distinction between popular and unpopular items diminishes as layers deepen, reducing the effectiveness of alignment at deeper levels. To overcome this limitation, GSDA integrates a hierarchical adaptive alignment mechanism that counteracts entropy decay across layers together with a distribution-aware contrastive weighting strategy based on the Gini coefficient, enabling the model to adapt its debiasing strength dynamically without relying on fixed hyperparameters. Extensive experiments on three benchmark datasets demonstrate that GSDA effectively alleviates popularity bias while consistently outperforming state-of-the-art methods in recommendation performance.
在推荐系统中,流行偏见是一个常见的挑战。它经常导致项目推荐性能不平衡,加剧马太效应。由于有限的用户-物品交互,不受欢迎的物品经常被限制在只有少数用户的嵌入域中,导致表征崩溃,削弱了模型的泛化能力。虽然现有的监督对齐和重加权方法可以帮助缓解这个问题,但它们仍然面临两个主要的局限性:(1)它们忽略了不同层之间的内在可变性,这可能导致更深层的负增益;(2)它们严重依赖固定的超参数来平衡流行和不流行的项目,限制了对不同数据分布的适应性,增加了模型的复杂性。为了解决这些挑战,我们提出了图结构双重适应框架(GSDA),这是一种双重适应框架,用于减轻推荐中的流行偏见。我们的理论分析表明,GCNs中的监督对齐受到过度平滑效应的阻碍,其中流行和不流行项目之间的区别随着层的加深而减弱,从而降低了更深层次上对齐的有效性。为了克服这一限制,GSDA集成了一种分层自适应对齐机制,该机制抵消了跨层的熵衰减,以及基于基尼系数的分布感知对比加权策略,使模型能够动态地适应其去偏强度,而不依赖于固定的超参数。在三个基准数据集上的大量实验表明,GSDA有效地缓解了流行偏差,同时在推荐性能方面始终优于最先进的方法。
{"title":"Graph-Structured Driven Dual Adaptation for Mitigating Popularity Bias","authors":"Miaomiao Cai;Lei Chen;Yifan Wang;Zhiyong Cheng;Min Zhang;Meng Wang","doi":"10.1109/TKDE.2025.3638343","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638343","url":null,"abstract":"Popularity bias is a common challenge in recommender systems. It often causes unbalanced item recommendation performance and intensifies the Matthew effect. Due to limited user-item interactions, unpopular items are frequently constrained to the embedding neighborhoods of only a few users, leading to representation collapse and weakening the model’s generalization. Although existing supervised alignment and reweighting methods can help mitigate this problem, they still face two major limitations: (1) they overlook the inherent variability among different Graph Convolutional Networks (GCNs) layers, which can result in negative gains in deeper layers; (2) they rely heavily on fixed hyperparameters to balance popular and unpopular items, limiting adaptability to diverse data distributions and increasing model complexity. To address these challenges, we propose <italic><b><u>G</u>raph-<u>S</u>tructured <u>D</u>ual <u>A</u>daptation Framework (GSDA)</b></i>, a dual adaptive framework for mitigating popularity bias in recommendation. Our theoretical analysis shows that supervised alignment in GCNs is hindered by the over-smoothing effect, where the distinction between popular and unpopular items diminishes as layers deepen, reducing the effectiveness of alignment at deeper levels. To overcome this limitation, <italic>GSDA</i> integrates a hierarchical adaptive alignment mechanism that counteracts entropy decay across layers together with a distribution-aware contrastive weighting strategy based on the Gini coefficient, enabling the model to adapt its debiasing strength dynamically without relying on fixed hyperparameters. Extensive experiments on three benchmark datasets demonstrate that <italic>GSDA</i> effectively alleviates popularity bias while consistently outperforming state-of-the-art methods in recommendation performance.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1129-1143"},"PeriodicalIF":10.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topology-Induced Low-Rank Tensor Representation for Spatio-Temporal Traffic Data Imputation 时空交通数据输入的拓扑诱导低秩张量表示
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-28 DOI: 10.1109/TKDE.2025.3638633
Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Ben-Zheng Li;Meng Ding
Spatio-temporal traffic data imputation is a fundamental component in intelligent transportation systems, which can significantly improve data quality and enhance the accuracy of downstream data mining tasks. Recently, low-rank tensor representation has shown great potential for spatio-temporal traffic data imputation. However, the low-rank assumption focuses on the global structure, neglecting the critical spatial topology and local temporal dependencies inherent in spatio-temporal data. To address these issues, we propose a topology-induced low-rank tensor representation (TILR), which can accurately capture the underlying low-rankness of the spatial multi-scale features induced by topology knowledge. Moreover, to exploit local temporal dependencies, we suggest a learnable convolutional regularization framework, which not only includes some classical convolution-based regularizers but also leads to the discovery of new convolutional regularizers. Equipped with the suggested TILR and convolutional regularizer, we build a unified low-rank tensor model harmonizing spatial topology and temporal dependencies for traffic data imputation, which is expected to deliver promising performance even under extreme and complex missing scenarios. To solve the proposed nonconvex model, we develop an efficient alternating direction method of multipliers (ADMM)-based algorithm and analyze its computational complexity. Extensive experiments demonstrate that the proposed model outperforms state-of-the-art baselines for various missing scenarios. These results reveal the critical synergy between topology-aware low-rank constraint and temporal dynamic modeling for spatio-temporal data imputation.
交通数据时空输入是智能交通系统的基础,可以显著提高数据质量,提高下游数据挖掘任务的准确性。近年来,低阶张量表示在交通数据的时空插值中显示出巨大的潜力。然而,低秩假设侧重于全局结构,忽略了时空数据固有的关键空间拓扑和局部时间依赖性。为了解决这些问题,我们提出了一种拓扑诱导的低秩张量表示(TILR),它可以准确地捕获由拓扑知识诱导的空间多尺度特征的潜在低秩。此外,为了利用局部时间依赖性,我们提出了一个可学习的卷积正则化框架,该框架不仅包括一些经典的基于卷积的正则化器,而且还导致了新的卷积正则化器的发现。利用所建议的TILR和卷积正则化器,我们建立了一个统一的低秩张量模型,协调交通数据的空间拓扑和时间依赖关系,即使在极端和复杂的缺失场景下,该模型也有望提供良好的性能。为了求解所提出的非凸模型,我们开发了一种基于交替方向乘法器(ADMM)的高效算法,并分析了其计算复杂度。大量的实验表明,所提出的模型在各种缺失情景下优于最先进的基线。这些结果揭示了拓扑感知低秩约束和时间动态建模在时空数据输入中的关键协同作用。
{"title":"Topology-Induced Low-Rank Tensor Representation for Spatio-Temporal Traffic Data Imputation","authors":"Zhi-Long Han;Ting-Zhu Huang;Xi-Le Zhao;Ben-Zheng Li;Meng Ding","doi":"10.1109/TKDE.2025.3638633","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3638633","url":null,"abstract":"Spatio-temporal traffic data imputation is a fundamental component in intelligent transportation systems, which can significantly improve data quality and enhance the accuracy of downstream data mining tasks. Recently, low-rank tensor representation has shown great potential for spatio-temporal traffic data imputation. However, the low-rank assumption focuses on the global structure, neglecting the critical spatial topology and local temporal dependencies inherent in spatio-temporal data. To address these issues, we propose a topology-induced low-rank tensor representation (TILR), which can accurately capture the underlying low-rankness of the spatial multi-scale features induced by topology knowledge. Moreover, to exploit local temporal dependencies, we suggest a learnable convolutional regularization framework, which not only includes some classical convolution-based regularizers but also leads to the discovery of new convolutional regularizers. Equipped with the suggested TILR and convolutional regularizer, we build a unified low-rank tensor model harmonizing spatial topology and temporal dependencies for traffic data imputation, which is expected to deliver promising performance even under extreme and complex missing scenarios. To solve the proposed nonconvex model, we develop an efficient alternating direction method of multipliers (ADMM)-based algorithm and analyze its computational complexity. Extensive experiments demonstrate that the proposed model outperforms state-of-the-art baselines for various missing scenarios. These results reveal the critical synergy between topology-aware low-rank constraint and temporal dynamic modeling for spatio-temporal data imputation.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1349-1363"},"PeriodicalIF":10.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Knowledge and Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1