Pub Date : 2025-10-14DOI: 10.1109/TKDE.2025.3620577
Songnian Zhang;Hao Yuan;Hui Zhu;Jun Shao;Yandong Zheng;Fengwei Wang
Merging multi-source time series data in cloud servers significantly enhances the effectiveness of analyses. However, privacy concerns are hindering time series analytics in the cloud. Responsively, numerous secure time series analytics schemes have been designed to address privacy concerns. Unfortunately, existing schemes suffer from severe performance issues, making them impractical for real-world applications. In this work, we propose novel secure time series analytics schemes that break through the performance bottleneck by substantially improving both communication and computational efficiency without compromising security. To attain this, we open up a new technique roadmap that leverages the idea of mixed model. Specifically, we design a non-interactive secure Euclidean distance protocol by tailoring homomorphic secret sharing to suit subtractive secret sharing. Additionally, we devise a different approach to securely compute the minimum of three elements, simultaneously reducing computational and communication costs. Moreover, we delicately introduce a rotation concept, design a rotation-based hybrid comparison mode, and finally propose our fast secure top-$k$ protocol that can dramatically reduce comparison complexity. With the above secure protocols, we propose a practical secure time series analytics scheme with exceptional performance and a security-enhanced scheme that considers stronger adversaries. Formal security analyses demonstrate that our proposed schemes can achieve the desired security requirements, while the comprehensive experimental evaluations illustrate that our schemes outperform the state-of-the-art scheme in both computation and communication.
在云服务器中合并多源时间序列数据可以显著提高分析的有效性。然而,隐私问题阻碍了云中的时间序列分析。相应地,许多安全的时间序列分析方案被设计来解决隐私问题。不幸的是,现有的方案存在严重的性能问题,使得它们不适合实际应用程序。在这项工作中,我们提出了新的安全时间序列分析方案,通过在不影响安全性的情况下大幅提高通信和计算效率来突破性能瓶颈。为了实现这一点,我们开辟了一个利用混合模型思想的新技术路线图。具体来说,我们通过调整同态秘密共享以适应相减秘密共享,设计了一种非交互的安全欧几里得距离协议。此外,我们设计了一种不同的方法来安全地计算至少三个元素,同时降低了计算和通信成本。此外,我们巧妙地引入了旋转的概念,设计了一种基于旋转的混合比较模式,最后提出了快速安全的top- k -协议,可以显著降低比较的复杂性。利用上述安全协议,我们提出了一种具有卓越性能的实用安全时间序列分析方案和一种考虑更强对手的安全增强方案。正式的安全分析表明,我们提出的方案可以达到预期的安全要求,而综合的实验评估表明,我们的方案在计算和通信方面都优于最先进的方案。
{"title":"Secure and Practical Time Series Analytics With Mixed Model","authors":"Songnian Zhang;Hao Yuan;Hui Zhu;Jun Shao;Yandong Zheng;Fengwei Wang","doi":"10.1109/TKDE.2025.3620577","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620577","url":null,"abstract":"Merging multi-source time series data in cloud servers significantly enhances the effectiveness of analyses. However, privacy concerns are hindering time series analytics in the cloud. Responsively, numerous secure time series analytics schemes have been designed to address privacy concerns. Unfortunately, existing schemes suffer from severe performance issues, making them impractical for real-world applications. In this work, we propose novel secure time series analytics schemes that break through the performance bottleneck by substantially improving both communication and computational efficiency without compromising security. To attain this, we open up a new technique roadmap that leverages the idea of mixed model. Specifically, we design a non-interactive secure Euclidean distance protocol by tailoring homomorphic secret sharing to suit subtractive secret sharing. Additionally, we devise a different approach to securely compute the minimum of three elements, simultaneously reducing computational and communication costs. Moreover, we delicately introduce a rotation concept, design a rotation-based hybrid comparison mode, and finally propose our fast secure top-<inline-formula><tex-math>$k$</tex-math></inline-formula> protocol that can dramatically reduce comparison complexity. With the above secure protocols, we propose a practical secure time series analytics scheme with exceptional performance and a security-enhanced scheme that considers stronger adversaries. Formal security analyses demonstrate that our proposed schemes can achieve the desired security requirements, while the comprehensive experimental evaluations illustrate that our schemes outperform the state-of-the-art scheme in both computation and communication.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"588-601"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph retrieval (GR), a ranking procedure that aims to sort the graphs in a database by their relevance to a query graph in decreasing order, has wide applications across diverse domains, such as visual object detection and drug discovery. Existing Graph Retrieval (GR) approaches usually compare graph pairs at a detailed level and generate quadratic similarity scores. In realistic scenarios, conducting quadratic fine-grained comparisons is costly. However, coarse-grained comparisons would result in performance loss. Moreover, label scarcity in real-world data brings extra challenges. To tackle these issues, we investigate a more realistic GR problem, namely, efficient graph retrieval (EGR). Our key intuition is that, since there are numerous underutilized unlabeled pairs in realistic scenarios, by leveraging the additional information they provide, we can achieve speed-up while simplifying the model without sacrificing performance. Following our intuition, we propose an efficient model called Dual-Tower Model with Dividing, Contrasting and Alignment (TowerDNA). TowerDNA utilizes a GNN-based dual-tower model as a backbone to quickly compare graph pairs in a coarse-grained manner. In addition, to effectively utilize unlabeled pairs, TowerDNA first identifies confident pairs from unlabeled pairs to expand labeled datasets. It then learns from remaining unconfident pairs via graph contrastive learning with geometric correspondence. To integrate all semantics with reduced biases, TowerDNA generates prototypes using labeled pairs, which are aligned within both confident and unconfident pairs. Extensive experiments on diverse realistic datasets demonstrate that TowerDNA achieves comparable performance to fine-grained methods while providing a 10× speed-up.
{"title":"TowerDNA: Fast and Accurate Graph Retrieval With Dividing, Contrasting and Alignment","authors":"Junwei Yang;Yiyang Gu;Yifang Qin;Xiao Luo;Zhiping Xiao;Kangjie Zheng;Wei Ju;Xian-Sheng Hua;Ming Zhang","doi":"10.1109/TKDE.2025.3621493","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621493","url":null,"abstract":"Graph retrieval (GR), a ranking procedure that aims to sort the graphs in a database by their relevance to a query graph in decreasing order, has wide applications across diverse domains, such as visual object detection and drug discovery. Existing Graph Retrieval (GR) approaches usually compare graph pairs at a detailed level and generate quadratic similarity scores. In realistic scenarios, conducting quadratic fine-grained comparisons is costly. However, coarse-grained comparisons would result in performance loss. Moreover, label scarcity in real-world data brings extra challenges. To tackle these issues, we investigate a more realistic GR problem, namely, efficient graph retrieval (EGR). Our key intuition is that, since there are numerous underutilized unlabeled pairs in realistic scenarios, by leveraging the additional information they provide, we can achieve speed-up while simplifying the model without sacrificing performance. Following our intuition, we propose an efficient model called Dual-<bold>Tower</b> Model with <bold>D</b>ividing, Co<bold>n</b>trasting and <bold>A</b>lignment (TowerDNA). TowerDNA utilizes a GNN-based dual-tower model as a backbone to quickly compare graph pairs in a coarse-grained manner. In addition, to effectively utilize unlabeled pairs, TowerDNA first identifies confident pairs from unlabeled pairs to expand labeled datasets. It then learns from remaining unconfident pairs via graph contrastive learning with geometric correspondence. To integrate all semantics with reduced biases, TowerDNA generates prototypes using labeled pairs, which are aligned within both confident and unconfident pairs. Extensive experiments on diverse realistic datasets demonstrate that TowerDNA achieves comparable performance to fine-grained methods while providing a 10× speed-up.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1364-1379"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1109/TKDE.2025.3620605
Yanping Wu;Renjie Sun;Xiaoyang Wang;Ying Zhang;Lu Qin;Wenjie Zhang;Xuemin Lin
A $k$-plex is a subgraph in which each vertex can miss edges to at most $k$ vertices, including itself. $k$-plex can find many real-world applications such as social network analysis and product recommendation. Previous studies about $k$-plex mainly focus on static graphs. However, in reality, relationships between two entities often occur at some specific timestamps, which can be modeled as temporal graphs. Directly extending the $k$-plex model may fail to find some critical groups in temporal graphs, which exhibit certain frequent occurring patterns. To fill the gap, in this paper, we develop a novel model, named $(k,l)$-plex, which is a vertex set that exists in no less than $l$ timestamps, at each of which the subgraph induced is a $k$-plex. To identify practical results, we propose and investigate two important problems, i.e., large maximal $(k,l)$-plex (MalKLP) enumeration and maximum $(k,l)$-plex (MaxKLP) identification. For the MalKLP enumeration problem, a reasonable baseline method is first proposed by extending the Bron-Kerbosch (BK) framework. To overcome the limitations in baseline and scale for large graphs, optimized strategies are developed, including novel graph reduction approach and search branch pruning techniques. For the MaxKLP identification task, we first design a baseline method by extending the proposed enumeration framework. Additionally, to accelerate the search, a new search framework with efficient branch pruning rules and refined graph reduction method is developed. Finally, comprehensive experiments are conducted on 14 real-world datasets to validate the efficiency and effectiveness of the proposed techniques.
{"title":"Efficient $k$k-Plex Mining in Temporal Graphs","authors":"Yanping Wu;Renjie Sun;Xiaoyang Wang;Ying Zhang;Lu Qin;Wenjie Zhang;Xuemin Lin","doi":"10.1109/TKDE.2025.3620605","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620605","url":null,"abstract":"A <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex is a subgraph in which each vertex can miss edges to at most <inline-formula><tex-math>$k$</tex-math></inline-formula> vertices, including itself. <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex can find many real-world applications such as social network analysis and product recommendation. Previous studies about <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex mainly focus on static graphs. However, in reality, relationships between two entities often occur at some specific timestamps, which can be modeled as temporal graphs. Directly extending the <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex model may fail to find some critical groups in temporal graphs, which exhibit certain frequent occurring patterns. To fill the gap, in this paper, we develop a novel model, named <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex, which is a vertex set that exists in no less than <inline-formula><tex-math>$l$</tex-math></inline-formula> timestamps, at each of which the subgraph induced is a <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex. To identify practical results, we propose and investigate two important problems, i.e., large maximal <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex (MalKLP) enumeration and maximum <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex (MaxKLP) identification. For the MalKLP enumeration problem, a reasonable baseline method is first proposed by extending the Bron-Kerbosch (BK) framework. To overcome the limitations in baseline and scale for large graphs, optimized strategies are developed, including novel graph reduction approach and search branch pruning techniques. For the MaxKLP identification task, we first design a baseline method by extending the proposed enumeration framework. Additionally, to accelerate the search, a new search framework with efficient branch pruning rules and refined graph reduction method is developed. Finally, comprehensive experiments are conducted on 14 real-world datasets to validate the efficiency and effectiveness of the proposed techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7105-7119"},"PeriodicalIF":10.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Contrastive Learning (GCL) has recently garnered significant attention for enhancing recommender systems. Most existing GCL-based methods perturb the raw data graph to generate views, performing contrastive learning across these views to learn generalizable representations. However, most of these methods rely on data- or model-based augmentation techniques that may disrupt interest consistency. In this paper, we propose a novel interest-aware augmentation approach based on diffusion models to address this issue. Specifically, we leverage a conditional diffusion model to generate interest-consistent views by conditioning on node interaction information, ensuring that the generated views align with the interests of the nodes. Based on this augmentation method, we introduce DiffCL, a graph contrastive learning framework for recommendation. Furthermore, we propose an easy-to-hard generation strategy. By progressively adjusting the starting point of the reverse denoising process, this strategy further enhances effective contrastive learning. We evaluate DiffCL on three public real-world datasets, and results indicate that our method outperforms state-of-the-art techniques, demonstrating its effectiveness.
{"title":"Interest-Aware Graph Contrastive Learning for Recommendation With Diffusion-Based Augmentation","authors":"Mengyuan Jing;Yanmin Zhu;Zhaobo Wang;Jiadi Yu;Feilong Tang","doi":"10.1109/TKDE.2025.3620600","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620600","url":null,"abstract":"Graph Contrastive Learning (GCL) has recently garnered significant attention for enhancing recommender systems. Most existing GCL-based methods perturb the raw data graph to generate views, performing contrastive learning across these views to learn generalizable representations. However, most of these methods rely on data- or model-based augmentation techniques that may disrupt interest consistency. In this paper, we propose a novel interest-aware augmentation approach based on diffusion models to address this issue. Specifically, we leverage a conditional diffusion model to generate interest-consistent views by conditioning on node interaction information, ensuring that the generated views align with the interests of the nodes. Based on this augmentation method, we introduce DiffCL, a graph contrastive learning framework for recommendation. Furthermore, we propose an easy-to-hard generation strategy. By progressively adjusting the starting point of the reverse denoising process, this strategy further enhances effective contrastive learning. We evaluate DiffCL on three public real-world datasets, and results indicate that our method outperforms state-of-the-art techniques, demonstrating its effectiveness.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"414-427"},"PeriodicalIF":10.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A precise workload forecaster is the key to effective resource management, system scalability, and overall operational efficiency in cloud environments. However, real-world cloud systems frequently operate in dynamic and unpredictable settings, causing workloads that exhibit significant diversity and fluctuations. To address these problems, we introduce OMCR, a novel online multivariate forecaster for cloud resource management, that overcomes the limitations of existing static forecasting methods through online learning. OMCR integrates long-term memory with a rapid response mechanism to short-term changes in cloud systems, while also considering the impact of multivariate relationships on workload prediction. OMCR minimizes its reliance on historical data, thereby reducing training difficulty and maintaining lower prediction loss in the long run. OMCR also offers an adaptive approach to forecasting peak workloads in a certain time span, which helps cloud resource management. Experimental results demonstrate the superior performance of our proposed framework compared to state-of-the-art methods in MAE and MSE metrics when forecasting cloud workloads.
{"title":"OMCR: An Online Multivariate Forecaster for Cloud Resource Management","authors":"Xu Gao;Xiu Tang;Chang Yao;Sai Wu;Gongsheng Yuan;Wenchao Zhou;Feifei Li;Gang Chen","doi":"10.1109/TKDE.2025.3619097","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3619097","url":null,"abstract":"A precise workload forecaster is the key to effective resource management, system scalability, and overall operational efficiency in cloud environments. However, real-world cloud systems frequently operate in dynamic and unpredictable settings, causing workloads that exhibit significant diversity and fluctuations. To address these problems, we introduce OMCR, a novel online multivariate forecaster for cloud resource management, that overcomes the limitations of existing static forecasting methods through online learning. OMCR integrates long-term memory with a rapid response mechanism to short-term changes in cloud systems, while also considering the impact of multivariate relationships on workload prediction. OMCR minimizes its reliance on historical data, thereby reducing training difficulty and maintaining lower prediction loss in the long run. OMCR also offers an adaptive approach to forecasting peak workloads in a certain time span, which helps cloud resource management. Experimental results demonstrate the superior performance of our proposed framework compared to state-of-the-art methods in MAE and MSE metrics when forecasting cloud workloads.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"532-545"},"PeriodicalIF":10.4,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the medical industry is generating a large amount of data. How to securely store and reliably share these medical data has been a hot research topic. Cloud storage technology can be applied to the medical industry to adapt to the rapid growth of medical data. However, cloud-based data storage and sharing systems face a series of security issues: whether the integrity of outsourced medical data can be guaranteed, and malicious access between different medical institutions may leak user’s privacy. This article proposes a system that simultaneously solves the integrity auditing of medical data and securely data sharing between different medical institutions under the terminal-edge-cloud framework. Specifically, patients/doctors are treated as terminal users, medical institutions are viewed as edge nodes, and medical clouds form the central storage layer. In the process of data auditing, third-party auditor can achieve integrity auditing of medical cloud storage data. Moreover, different medical institutions use private-set-intersection technology to share the common user’s electronic medical data, while for other users not in intersection set, their data does not need to be shared. Finally, security and performance analyses show that our proposed system is provable secure and has high computational and communication efficiency.
{"title":"Private-Set-Intersection-Based Medical Data Sharing Scheme With Integrity Auditing for IoMT Cloud Storage Systems","authors":"Zekun Li;Jinyong Chang;Bei Liang;Kaijing Ling;Yifan Dong;Yanyan Ji;Maozhi Xu","doi":"10.1109/TKDE.2025.3619426","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3619426","url":null,"abstract":"In recent years, the medical industry is generating a large amount of data. How to securely store and reliably share these medical data has been a hot research topic. Cloud storage technology can be applied to the medical industry to adapt to the rapid growth of medical data. However, cloud-based data storage and sharing systems face a series of security issues: whether the integrity of outsourced medical data can be guaranteed, and malicious access between different medical institutions may leak user’s privacy. This article proposes a system that simultaneously solves the integrity auditing of medical data and securely data sharing between different medical institutions under the terminal-edge-cloud framework. Specifically, patients/doctors are treated as terminal users, medical institutions are viewed as edge nodes, and medical clouds form the central storage layer. In the process of data auditing, third-party auditor can achieve integrity auditing of medical cloud storage data. Moreover, different medical institutions use private-set-intersection technology to share the common user’s electronic medical data, while for other users not in intersection set, their data does not need to be shared. Finally, security and performance analyses show that our proposed system is provable secure and has high computational and communication efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7402-7413"},"PeriodicalIF":10.4,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph-level anomaly detection (GLAD) aims to distinguish anomalous graphs that exhibit significant deviations from others. The graph-graph relationship, revealing the deviation and similarity between graphs, offers global insights into the entire graph level for highlighting the anomalies’ divergence from normal graph patterns. Thus, understanding graph-graph relationships is critical to boosting models on GLAD tasks. However, existing deep GLAD algorithms heavily rely on Graph Neural Networks that primarily focus on analyzing individual graphs. These methods overlook the significance of graph-graph relationships in telling anomalies from normal graphs. In this paper, we propose a novel model for Graph-level Anomaly Detection using the Transformer technique, namely GADTrans. Specifically, GADTrans builds the transformer upon crucial subgraphs mined by a parametrized extractor, for modeling precise graph-graph relationships. The learned graph-graph relationships put effort into distinguishing normal and anomalous graphs. In addition, a specific loss is introduced to guide GADTrans in highlighting the deviation between anomalous and normal graphs while underlining the similarities among normal graphs. GADTrans achieves model interpretability by delivering human-interpretable results, which are learned graph-graph relationships and crucial subgraphs. Extensive experiments on six real-world datasets verify the effectiveness and superiority of GADTrans for GLAD tasks.
{"title":"Learning From Graph-Graph Relationship: A New Perspective on Graph-Level Anomaly Detection","authors":"Zhenyu Yang;Ge Zhang;Jia Wu;Jian Yang;Hao Peng;Pietro Lió","doi":"10.1109/TKDE.2025.3618929","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3618929","url":null,"abstract":"Graph-level anomaly detection (GLAD) aims to distinguish anomalous graphs that exhibit significant deviations from others. The graph-graph relationship, revealing the deviation and similarity between graphs, offers global insights into the entire graph level for highlighting the anomalies’ divergence from normal graph patterns. Thus, understanding graph-graph relationships is critical to boosting models on GLAD tasks. However, existing deep GLAD algorithms heavily rely on Graph Neural Networks that primarily focus on analyzing individual graphs. These methods overlook the significance of graph-graph relationships in telling anomalies from normal graphs. In this paper, we propose a novel model for Graph-level Anomaly Detection using the Transformer technique, namely GADTrans. Specifically, GADTrans builds the transformer upon crucial subgraphs mined by a parametrized extractor, for modeling precise graph-graph relationships. The learned graph-graph relationships put effort into distinguishing normal and anomalous graphs. In addition, a specific loss is introduced to guide GADTrans in highlighting the deviation between anomalous and normal graphs while underlining the similarities among normal graphs. GADTrans achieves model interpretability by delivering human-interpretable results, which are learned graph-graph relationships and crucial subgraphs. Extensive experiments on six real-world datasets verify the effectiveness and superiority of GADTrans for GLAD tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"428-441"},"PeriodicalIF":10.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1109/TKDE.2025.3617894
Hao Wu;Qu Wang;Xin Luo;Zidong Wang
A nonstandard tensor is frequently adopted to model a large-sale complex dynamic network. A Tensor Representation Learning (TRL) model enables extracting valuable knowledge form a dynamic network via learning low-dimensional representation of a target nonstandard tensor. Nevertheless, the representation learning ability of existing TRL models are limited for a nonstandard tensor due to its inability to accurately represent the specific nature of the nonstandard tensor, i.e., mode imbalance, high-dimension, and incompleteness. To address this issue, this study innovatively proposes a Mode-Aware Tucker Network-based Tensor Representation Learning (MTN-TRL) model with three-fold ideas: a) designing a mode-aware Tucker network to accurately represent the imbalanced mode of a nonstandard tensor, b) building an MTN-based high-efficient TRL model that fuses both data density-oriented modeling principle and adaptive parameters learning scheme, and c) theoretically proving the MTN-TRL model’s convergence. Extensive experiments on eight nonstandard tensors generating from real-world dynamic networks demonstrate that MTN-TRL significantly outperforms state-of-the-art models in terms of representation accuracy.
{"title":"Learning Accurate Representation to Nonstandard Tensors via a Mode-Aware Tucker Network","authors":"Hao Wu;Qu Wang;Xin Luo;Zidong Wang","doi":"10.1109/TKDE.2025.3617894","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3617894","url":null,"abstract":"A nonstandard tensor is frequently adopted to model a large-sale complex dynamic network. A Tensor Representation Learning (TRL) model enables extracting valuable knowledge form a dynamic network via learning low-dimensional representation of a target nonstandard tensor. Nevertheless, the representation learning ability of existing TRL models are limited for a nonstandard tensor due to its inability to accurately represent the specific nature of the nonstandard tensor, i.e., mode imbalance, high-dimension, and incompleteness. To address this issue, this study innovatively proposes a Mode-Aware Tucker Network-based Tensor Representation Learning (MTN-TRL) model with three-fold ideas: a) designing a mode-aware Tucker network to accurately represent the imbalanced mode of a nonstandard tensor, b) building an MTN-based high-efficient TRL model that fuses both data density-oriented modeling principle and adaptive parameters learning scheme, and c) theoretically proving the MTN-TRL model’s convergence. Extensive experiments on eight nonstandard tensors generating from real-world dynamic networks demonstrate that MTN-TRL significantly outperforms state-of-the-art models in terms of representation accuracy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7272-7285"},"PeriodicalIF":10.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1109/TKDE.2025.3617461
Zhouyang Liu;Yixin Chen;Ning Liu;Jiezhong He;Dongsheng Li
Graph similarity is critical in graph-related tasks such as graph retrieval, where metrics like maximum common subgraph (MCS) and graph edit distance (GED) are commonly used. However, exact computations of these metrics are known to be NP-Hard. Recent neural network-based approaches approximate the similarity score in embedding spaces to alleviate the computational burden, but they either involve expensive pairwise node comparisons or fail to effectively utilize structural and scale information of graphs. To tackle these issues, we propose a novel geometric-based graph embedding method called Graph2Region (G2R). G2R represents nodes as closed regions and recovers their adjacency patterns within graphs in the embedding space. By incorporating the node features and adjacency patterns of graphs, G2R summarizes graph regions, i.e., graph embeddings, where the shape captures the underlying graph structures and the volume reflects the graph size. Consequently, the overlap between graph regions can serve as an approximation of MCS, signifying similar node regions and adjacency patterns. We further analyze the relationship between MCS and GED and propose using disjoint parts as a proxy for GED similarity. This analysis enables concurrent computation of MCS and GED, incorporating local and global structural information. Experimental evaluation highlights G2R’s competitive performance in graph similarity computation. It achieves up to a 60.0% relative accuracy improvement over state-of-the-art methods in MCS similarity learning, while maintaining efficiency in both training and inference. Moreover, G2R showcases remarkable capability in predicting both MCS and GED similarities simultaneously, providing a holistic assessment of graph similarity.
{"title":"Graph2Region: Efficient Graph Similarity Learning With Structure and Scale Restoration","authors":"Zhouyang Liu;Yixin Chen;Ning Liu;Jiezhong He;Dongsheng Li","doi":"10.1109/TKDE.2025.3617461","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3617461","url":null,"abstract":"Graph similarity is critical in graph-related tasks such as graph retrieval, where metrics like maximum common subgraph (MCS) and graph edit distance (GED) are commonly used. However, exact computations of these metrics are known to be NP-Hard. Recent neural network-based approaches approximate the similarity score in embedding spaces to alleviate the computational burden, but they either involve expensive pairwise node comparisons or fail to effectively utilize structural and scale information of graphs. To tackle these issues, we propose a novel geometric-based graph embedding method called <sc>Graph2Region</small> (<sc>G2R</small>). <sc>G2R</small> represents nodes as closed regions and recovers their adjacency patterns within graphs in the embedding space. By incorporating the node features and adjacency patterns of graphs, <sc>G2R</small> summarizes graph regions, i.e., graph embeddings, where the shape captures the underlying graph structures and the volume reflects the graph size. Consequently, the overlap between graph regions can serve as an approximation of MCS, signifying similar node regions and adjacency patterns. We further analyze the relationship between MCS and GED and propose using disjoint parts as a proxy for GED similarity. This analysis enables concurrent computation of MCS and GED, incorporating local and global structural information. Experimental evaluation highlights <sc>G2R</small>’s competitive performance in graph similarity computation. It achieves up to a 60.0% relative accuracy improvement over state-of-the-art methods in MCS similarity learning, while maintaining efficiency in both training and inference. Moreover, <sc>G2R</small> showcases remarkable capability in predicting both MCS and GED similarities simultaneously, providing a holistic assessment of graph similarity.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7213-7225"},"PeriodicalIF":10.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data privacy protection legislation around the world has increasingly enforced the “right to be forgotten” regulation, generating a surge in research interest in machine unlearning (MU), which aims to remove the impact of training data from machine learning models upon receiving revocation requests from data owners. There exist two major challenges for the performance of MU: the execution efficiency and the inference interference. The former requires minimizing the computational overhead for each execution of the MU mechanism, while the latter calls for reducing the execution frequency to minimize interference with normal inference services. Nowadays most MU studies focus on the sample-level unlearning setting, leaving the other paramount feature-level setting under-explored. Adapting these existing techniques to the latter turns out to be non-trivial. The only known feature-level work achieves an approximate unlearning guarantee, but suffers from degraded model accuracy and still leaves the inference interference challenge unsolved. We are therefore motivated to propose FELEMN, the first FEature-Level Exact Machine uNlearning method that overcomes both of the above-mentioned hurdles. For the MU execution efficiency challenge, we explore the impact of different feature partitioning strategies on the preservation of semantic relationships for maintaining model accuracy and MU efficiency. For the inference interference challenge, we propose two batching mechanisms to combine as many individual unlearning requests to be processed together as possible, while avoiding potential privacy issues coming with falsely postponing unlearning requests, which is grounded on theoretical analysis. Experiments on five real datasets show that our FELEMN outperforms up-to-date competitors with up to $3times$ speedup for each MU execution, and 50% runtime reduction by mitigating inference interference.
{"title":"FELEMN: Toward Efficient Feature-Level Machine Unlearning for Exact Privacy Protection","authors":"Zhigang Wang;Yizhen Yu;Mingxin Li;Jian Lou;Ning Wang;Yu Gu;Shen Su;Yuan Liu;Hui Jiang;Zhihong Tian","doi":"10.1109/TKDE.2025.3613659","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3613659","url":null,"abstract":"Data privacy protection legislation around the world has increasingly enforced the “right to be forgotten” regulation, generating a surge in research interest in machine unlearning (MU), which aims to remove the impact of training data from machine learning models upon receiving revocation requests from data owners. There exist two major challenges for the performance of MU: the execution efficiency and the inference interference. The former requires minimizing the computational overhead for each execution of the MU mechanism, while the latter calls for reducing the execution frequency to minimize interference with normal inference services. Nowadays most MU studies focus on the sample-level unlearning setting, leaving the other paramount feature-level setting under-explored. Adapting these existing techniques to the latter turns out to be non-trivial. The only known feature-level work achieves an <i>approximate</i> unlearning guarantee, but suffers from degraded model accuracy and still leaves the inference interference challenge unsolved. We are therefore motivated to propose FELEMN, the first FEature-Level Exact Machine uNlearning method that overcomes both of the above-mentioned hurdles. For the MU execution efficiency challenge, we explore the impact of different feature partitioning strategies on the preservation of semantic relationships for maintaining model accuracy and MU efficiency. For the inference interference challenge, we propose two batching mechanisms to combine as many individual unlearning requests to be processed together as possible, while avoiding potential privacy issues coming with falsely postponing unlearning requests, which is grounded on theoretical analysis. Experiments on five real datasets show that our FELEMN outperforms up-to-date competitors with up to <inline-formula><tex-math>$3times$</tex-math></inline-formula> speedup for each MU execution, and 50% runtime reduction by mitigating inference interference.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7169-7183"},"PeriodicalIF":10.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145456019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}