Pub Date : 2025-10-15DOI: 10.1109/TKDE.2025.3621843
Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy
Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an Incremental Max-P regionalization with statistical constraints (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the FaCT algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. FaCT supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with FaCT to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.
{"title":"IMS: Incremental Max-P Regionalization With Statistical Constraints","authors":"Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy","doi":"10.1109/TKDE.2025.3621843","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621843","url":null,"abstract":"Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an <italic>Incremental Max-P regionalization with statistical constraints</i> (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the <italic>FaCT</i> algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. <italic>FaCT</i> supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with <italic>FaCT</i> to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"380-398"},"PeriodicalIF":10.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.
{"title":"Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends","authors":"Yunshi Lan;Xinyuan Li;Hanyue Du;Xuesong Lu;Ming Gao;Weining Qian;Aoying Zhou","doi":"10.1109/TKDE.2025.3621181","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621181","url":null,"abstract":"Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"659-678"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1109/TKDE.2025.3620577
Songnian Zhang;Hao Yuan;Hui Zhu;Jun Shao;Yandong Zheng;Fengwei Wang
Merging multi-source time series data in cloud servers significantly enhances the effectiveness of analyses. However, privacy concerns are hindering time series analytics in the cloud. Responsively, numerous secure time series analytics schemes have been designed to address privacy concerns. Unfortunately, existing schemes suffer from severe performance issues, making them impractical for real-world applications. In this work, we propose novel secure time series analytics schemes that break through the performance bottleneck by substantially improving both communication and computational efficiency without compromising security. To attain this, we open up a new technique roadmap that leverages the idea of mixed model. Specifically, we design a non-interactive secure Euclidean distance protocol by tailoring homomorphic secret sharing to suit subtractive secret sharing. Additionally, we devise a different approach to securely compute the minimum of three elements, simultaneously reducing computational and communication costs. Moreover, we delicately introduce a rotation concept, design a rotation-based hybrid comparison mode, and finally propose our fast secure top-$k$ protocol that can dramatically reduce comparison complexity. With the above secure protocols, we propose a practical secure time series analytics scheme with exceptional performance and a security-enhanced scheme that considers stronger adversaries. Formal security analyses demonstrate that our proposed schemes can achieve the desired security requirements, while the comprehensive experimental evaluations illustrate that our schemes outperform the state-of-the-art scheme in both computation and communication.
在云服务器中合并多源时间序列数据可以显著提高分析的有效性。然而,隐私问题阻碍了云中的时间序列分析。相应地,许多安全的时间序列分析方案被设计来解决隐私问题。不幸的是,现有的方案存在严重的性能问题,使得它们不适合实际应用程序。在这项工作中,我们提出了新的安全时间序列分析方案,通过在不影响安全性的情况下大幅提高通信和计算效率来突破性能瓶颈。为了实现这一点,我们开辟了一个利用混合模型思想的新技术路线图。具体来说,我们通过调整同态秘密共享以适应相减秘密共享,设计了一种非交互的安全欧几里得距离协议。此外,我们设计了一种不同的方法来安全地计算至少三个元素,同时降低了计算和通信成本。此外,我们巧妙地引入了旋转的概念,设计了一种基于旋转的混合比较模式,最后提出了快速安全的top- k -协议,可以显著降低比较的复杂性。利用上述安全协议,我们提出了一种具有卓越性能的实用安全时间序列分析方案和一种考虑更强对手的安全增强方案。正式的安全分析表明,我们提出的方案可以达到预期的安全要求,而综合的实验评估表明,我们的方案在计算和通信方面都优于最先进的方案。
{"title":"Secure and Practical Time Series Analytics With Mixed Model","authors":"Songnian Zhang;Hao Yuan;Hui Zhu;Jun Shao;Yandong Zheng;Fengwei Wang","doi":"10.1109/TKDE.2025.3620577","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620577","url":null,"abstract":"Merging multi-source time series data in cloud servers significantly enhances the effectiveness of analyses. However, privacy concerns are hindering time series analytics in the cloud. Responsively, numerous secure time series analytics schemes have been designed to address privacy concerns. Unfortunately, existing schemes suffer from severe performance issues, making them impractical for real-world applications. In this work, we propose novel secure time series analytics schemes that break through the performance bottleneck by substantially improving both communication and computational efficiency without compromising security. To attain this, we open up a new technique roadmap that leverages the idea of mixed model. Specifically, we design a non-interactive secure Euclidean distance protocol by tailoring homomorphic secret sharing to suit subtractive secret sharing. Additionally, we devise a different approach to securely compute the minimum of three elements, simultaneously reducing computational and communication costs. Moreover, we delicately introduce a rotation concept, design a rotation-based hybrid comparison mode, and finally propose our fast secure top-<inline-formula><tex-math>$k$</tex-math></inline-formula> protocol that can dramatically reduce comparison complexity. With the above secure protocols, we propose a practical secure time series analytics scheme with exceptional performance and a security-enhanced scheme that considers stronger adversaries. Formal security analyses demonstrate that our proposed schemes can achieve the desired security requirements, while the comprehensive experimental evaluations illustrate that our schemes outperform the state-of-the-art scheme in both computation and communication.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"588-601"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph retrieval (GR), a ranking procedure that aims to sort the graphs in a database by their relevance to a query graph in decreasing order, has wide applications across diverse domains, such as visual object detection and drug discovery. Existing Graph Retrieval (GR) approaches usually compare graph pairs at a detailed level and generate quadratic similarity scores. In realistic scenarios, conducting quadratic fine-grained comparisons is costly. However, coarse-grained comparisons would result in performance loss. Moreover, label scarcity in real-world data brings extra challenges. To tackle these issues, we investigate a more realistic GR problem, namely, efficient graph retrieval (EGR). Our key intuition is that, since there are numerous underutilized unlabeled pairs in realistic scenarios, by leveraging the additional information they provide, we can achieve speed-up while simplifying the model without sacrificing performance. Following our intuition, we propose an efficient model called Dual-Tower Model with Dividing, Contrasting and Alignment (TowerDNA). TowerDNA utilizes a GNN-based dual-tower model as a backbone to quickly compare graph pairs in a coarse-grained manner. In addition, to effectively utilize unlabeled pairs, TowerDNA first identifies confident pairs from unlabeled pairs to expand labeled datasets. It then learns from remaining unconfident pairs via graph contrastive learning with geometric correspondence. To integrate all semantics with reduced biases, TowerDNA generates prototypes using labeled pairs, which are aligned within both confident and unconfident pairs. Extensive experiments on diverse realistic datasets demonstrate that TowerDNA achieves comparable performance to fine-grained methods while providing a 10× speed-up.
{"title":"TowerDNA: Fast and Accurate Graph Retrieval With Dividing, Contrasting and Alignment","authors":"Junwei Yang;Yiyang Gu;Yifang Qin;Xiao Luo;Zhiping Xiao;Kangjie Zheng;Wei Ju;Xian-Sheng Hua;Ming Zhang","doi":"10.1109/TKDE.2025.3621493","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621493","url":null,"abstract":"Graph retrieval (GR), a ranking procedure that aims to sort the graphs in a database by their relevance to a query graph in decreasing order, has wide applications across diverse domains, such as visual object detection and drug discovery. Existing Graph Retrieval (GR) approaches usually compare graph pairs at a detailed level and generate quadratic similarity scores. In realistic scenarios, conducting quadratic fine-grained comparisons is costly. However, coarse-grained comparisons would result in performance loss. Moreover, label scarcity in real-world data brings extra challenges. To tackle these issues, we investigate a more realistic GR problem, namely, efficient graph retrieval (EGR). Our key intuition is that, since there are numerous underutilized unlabeled pairs in realistic scenarios, by leveraging the additional information they provide, we can achieve speed-up while simplifying the model without sacrificing performance. Following our intuition, we propose an efficient model called Dual-<bold>Tower</b> Model with <bold>D</b>ividing, Co<bold>n</b>trasting and <bold>A</b>lignment (TowerDNA). TowerDNA utilizes a GNN-based dual-tower model as a backbone to quickly compare graph pairs in a coarse-grained manner. In addition, to effectively utilize unlabeled pairs, TowerDNA first identifies confident pairs from unlabeled pairs to expand labeled datasets. It then learns from remaining unconfident pairs via graph contrastive learning with geometric correspondence. To integrate all semantics with reduced biases, TowerDNA generates prototypes using labeled pairs, which are aligned within both confident and unconfident pairs. Extensive experiments on diverse realistic datasets demonstrate that TowerDNA achieves comparable performance to fine-grained methods while providing a 10× speed-up.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 2","pages":"1364-1379"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1109/TKDE.2025.3620605
Yanping Wu;Renjie Sun;Xiaoyang Wang;Ying Zhang;Lu Qin;Wenjie Zhang;Xuemin Lin
A $k$-plex is a subgraph in which each vertex can miss edges to at most $k$ vertices, including itself. $k$-plex can find many real-world applications such as social network analysis and product recommendation. Previous studies about $k$-plex mainly focus on static graphs. However, in reality, relationships between two entities often occur at some specific timestamps, which can be modeled as temporal graphs. Directly extending the $k$-plex model may fail to find some critical groups in temporal graphs, which exhibit certain frequent occurring patterns. To fill the gap, in this paper, we develop a novel model, named $(k,l)$-plex, which is a vertex set that exists in no less than $l$ timestamps, at each of which the subgraph induced is a $k$-plex. To identify practical results, we propose and investigate two important problems, i.e., large maximal $(k,l)$-plex (MalKLP) enumeration and maximum $(k,l)$-plex (MaxKLP) identification. For the MalKLP enumeration problem, a reasonable baseline method is first proposed by extending the Bron-Kerbosch (BK) framework. To overcome the limitations in baseline and scale for large graphs, optimized strategies are developed, including novel graph reduction approach and search branch pruning techniques. For the MaxKLP identification task, we first design a baseline method by extending the proposed enumeration framework. Additionally, to accelerate the search, a new search framework with efficient branch pruning rules and refined graph reduction method is developed. Finally, comprehensive experiments are conducted on 14 real-world datasets to validate the efficiency and effectiveness of the proposed techniques.
{"title":"Efficient $k$k-Plex Mining in Temporal Graphs","authors":"Yanping Wu;Renjie Sun;Xiaoyang Wang;Ying Zhang;Lu Qin;Wenjie Zhang;Xuemin Lin","doi":"10.1109/TKDE.2025.3620605","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620605","url":null,"abstract":"A <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex is a subgraph in which each vertex can miss edges to at most <inline-formula><tex-math>$k$</tex-math></inline-formula> vertices, including itself. <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex can find many real-world applications such as social network analysis and product recommendation. Previous studies about <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex mainly focus on static graphs. However, in reality, relationships between two entities often occur at some specific timestamps, which can be modeled as temporal graphs. Directly extending the <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex model may fail to find some critical groups in temporal graphs, which exhibit certain frequent occurring patterns. To fill the gap, in this paper, we develop a novel model, named <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex, which is a vertex set that exists in no less than <inline-formula><tex-math>$l$</tex-math></inline-formula> timestamps, at each of which the subgraph induced is a <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex. To identify practical results, we propose and investigate two important problems, i.e., large maximal <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex (MalKLP) enumeration and maximum <inline-formula><tex-math>$(k,l)$</tex-math></inline-formula>-plex (MaxKLP) identification. For the MalKLP enumeration problem, a reasonable baseline method is first proposed by extending the Bron-Kerbosch (BK) framework. To overcome the limitations in baseline and scale for large graphs, optimized strategies are developed, including novel graph reduction approach and search branch pruning techniques. For the MaxKLP identification task, we first design a baseline method by extending the proposed enumeration framework. Additionally, to accelerate the search, a new search framework with efficient branch pruning rules and refined graph reduction method is developed. Finally, comprehensive experiments are conducted on 14 real-world datasets to validate the efficiency and effectiveness of the proposed techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7105-7119"},"PeriodicalIF":10.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Contrastive Learning (GCL) has recently garnered significant attention for enhancing recommender systems. Most existing GCL-based methods perturb the raw data graph to generate views, performing contrastive learning across these views to learn generalizable representations. However, most of these methods rely on data- or model-based augmentation techniques that may disrupt interest consistency. In this paper, we propose a novel interest-aware augmentation approach based on diffusion models to address this issue. Specifically, we leverage a conditional diffusion model to generate interest-consistent views by conditioning on node interaction information, ensuring that the generated views align with the interests of the nodes. Based on this augmentation method, we introduce DiffCL, a graph contrastive learning framework for recommendation. Furthermore, we propose an easy-to-hard generation strategy. By progressively adjusting the starting point of the reverse denoising process, this strategy further enhances effective contrastive learning. We evaluate DiffCL on three public real-world datasets, and results indicate that our method outperforms state-of-the-art techniques, demonstrating its effectiveness.
{"title":"Interest-Aware Graph Contrastive Learning for Recommendation With Diffusion-Based Augmentation","authors":"Mengyuan Jing;Yanmin Zhu;Zhaobo Wang;Jiadi Yu;Feilong Tang","doi":"10.1109/TKDE.2025.3620600","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3620600","url":null,"abstract":"Graph Contrastive Learning (GCL) has recently garnered significant attention for enhancing recommender systems. Most existing GCL-based methods perturb the raw data graph to generate views, performing contrastive learning across these views to learn generalizable representations. However, most of these methods rely on data- or model-based augmentation techniques that may disrupt interest consistency. In this paper, we propose a novel interest-aware augmentation approach based on diffusion models to address this issue. Specifically, we leverage a conditional diffusion model to generate interest-consistent views by conditioning on node interaction information, ensuring that the generated views align with the interests of the nodes. Based on this augmentation method, we introduce DiffCL, a graph contrastive learning framework for recommendation. Furthermore, we propose an easy-to-hard generation strategy. By progressively adjusting the starting point of the reverse denoising process, this strategy further enhances effective contrastive learning. We evaluate DiffCL on three public real-world datasets, and results indicate that our method outperforms state-of-the-art techniques, demonstrating its effectiveness.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"414-427"},"PeriodicalIF":10.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A precise workload forecaster is the key to effective resource management, system scalability, and overall operational efficiency in cloud environments. However, real-world cloud systems frequently operate in dynamic and unpredictable settings, causing workloads that exhibit significant diversity and fluctuations. To address these problems, we introduce OMCR, a novel online multivariate forecaster for cloud resource management, that overcomes the limitations of existing static forecasting methods through online learning. OMCR integrates long-term memory with a rapid response mechanism to short-term changes in cloud systems, while also considering the impact of multivariate relationships on workload prediction. OMCR minimizes its reliance on historical data, thereby reducing training difficulty and maintaining lower prediction loss in the long run. OMCR also offers an adaptive approach to forecasting peak workloads in a certain time span, which helps cloud resource management. Experimental results demonstrate the superior performance of our proposed framework compared to state-of-the-art methods in MAE and MSE metrics when forecasting cloud workloads.
{"title":"OMCR: An Online Multivariate Forecaster for Cloud Resource Management","authors":"Xu Gao;Xiu Tang;Chang Yao;Sai Wu;Gongsheng Yuan;Wenchao Zhou;Feifei Li;Gang Chen","doi":"10.1109/TKDE.2025.3619097","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3619097","url":null,"abstract":"A precise workload forecaster is the key to effective resource management, system scalability, and overall operational efficiency in cloud environments. However, real-world cloud systems frequently operate in dynamic and unpredictable settings, causing workloads that exhibit significant diversity and fluctuations. To address these problems, we introduce OMCR, a novel online multivariate forecaster for cloud resource management, that overcomes the limitations of existing static forecasting methods through online learning. OMCR integrates long-term memory with a rapid response mechanism to short-term changes in cloud systems, while also considering the impact of multivariate relationships on workload prediction. OMCR minimizes its reliance on historical data, thereby reducing training difficulty and maintaining lower prediction loss in the long run. OMCR also offers an adaptive approach to forecasting peak workloads in a certain time span, which helps cloud resource management. Experimental results demonstrate the superior performance of our proposed framework compared to state-of-the-art methods in MAE and MSE metrics when forecasting cloud workloads.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"532-545"},"PeriodicalIF":10.4,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the medical industry is generating a large amount of data. How to securely store and reliably share these medical data has been a hot research topic. Cloud storage technology can be applied to the medical industry to adapt to the rapid growth of medical data. However, cloud-based data storage and sharing systems face a series of security issues: whether the integrity of outsourced medical data can be guaranteed, and malicious access between different medical institutions may leak user’s privacy. This article proposes a system that simultaneously solves the integrity auditing of medical data and securely data sharing between different medical institutions under the terminal-edge-cloud framework. Specifically, patients/doctors are treated as terminal users, medical institutions are viewed as edge nodes, and medical clouds form the central storage layer. In the process of data auditing, third-party auditor can achieve integrity auditing of medical cloud storage data. Moreover, different medical institutions use private-set-intersection technology to share the common user’s electronic medical data, while for other users not in intersection set, their data does not need to be shared. Finally, security and performance analyses show that our proposed system is provable secure and has high computational and communication efficiency.
{"title":"Private-Set-Intersection-Based Medical Data Sharing Scheme With Integrity Auditing for IoMT Cloud Storage Systems","authors":"Zekun Li;Jinyong Chang;Bei Liang;Kaijing Ling;Yifan Dong;Yanyan Ji;Maozhi Xu","doi":"10.1109/TKDE.2025.3619426","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3619426","url":null,"abstract":"In recent years, the medical industry is generating a large amount of data. How to securely store and reliably share these medical data has been a hot research topic. Cloud storage technology can be applied to the medical industry to adapt to the rapid growth of medical data. However, cloud-based data storage and sharing systems face a series of security issues: whether the integrity of outsourced medical data can be guaranteed, and malicious access between different medical institutions may leak user’s privacy. This article proposes a system that simultaneously solves the integrity auditing of medical data and securely data sharing between different medical institutions under the terminal-edge-cloud framework. Specifically, patients/doctors are treated as terminal users, medical institutions are viewed as edge nodes, and medical clouds form the central storage layer. In the process of data auditing, third-party auditor can achieve integrity auditing of medical cloud storage data. Moreover, different medical institutions use private-set-intersection technology to share the common user’s electronic medical data, while for other users not in intersection set, their data does not need to be shared. Finally, security and performance analyses show that our proposed system is provable secure and has high computational and communication efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7402-7413"},"PeriodicalIF":10.4,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph-level anomaly detection (GLAD) aims to distinguish anomalous graphs that exhibit significant deviations from others. The graph-graph relationship, revealing the deviation and similarity between graphs, offers global insights into the entire graph level for highlighting the anomalies’ divergence from normal graph patterns. Thus, understanding graph-graph relationships is critical to boosting models on GLAD tasks. However, existing deep GLAD algorithms heavily rely on Graph Neural Networks that primarily focus on analyzing individual graphs. These methods overlook the significance of graph-graph relationships in telling anomalies from normal graphs. In this paper, we propose a novel model for Graph-level Anomaly Detection using the Transformer technique, namely GADTrans. Specifically, GADTrans builds the transformer upon crucial subgraphs mined by a parametrized extractor, for modeling precise graph-graph relationships. The learned graph-graph relationships put effort into distinguishing normal and anomalous graphs. In addition, a specific loss is introduced to guide GADTrans in highlighting the deviation between anomalous and normal graphs while underlining the similarities among normal graphs. GADTrans achieves model interpretability by delivering human-interpretable results, which are learned graph-graph relationships and crucial subgraphs. Extensive experiments on six real-world datasets verify the effectiveness and superiority of GADTrans for GLAD tasks.
{"title":"Learning From Graph-Graph Relationship: A New Perspective on Graph-Level Anomaly Detection","authors":"Zhenyu Yang;Ge Zhang;Jia Wu;Jian Yang;Hao Peng;Pietro Lió","doi":"10.1109/TKDE.2025.3618929","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3618929","url":null,"abstract":"Graph-level anomaly detection (GLAD) aims to distinguish anomalous graphs that exhibit significant deviations from others. The graph-graph relationship, revealing the deviation and similarity between graphs, offers global insights into the entire graph level for highlighting the anomalies’ divergence from normal graph patterns. Thus, understanding graph-graph relationships is critical to boosting models on GLAD tasks. However, existing deep GLAD algorithms heavily rely on Graph Neural Networks that primarily focus on analyzing individual graphs. These methods overlook the significance of graph-graph relationships in telling anomalies from normal graphs. In this paper, we propose a novel model for Graph-level Anomaly Detection using the Transformer technique, namely GADTrans. Specifically, GADTrans builds the transformer upon crucial subgraphs mined by a parametrized extractor, for modeling precise graph-graph relationships. The learned graph-graph relationships put effort into distinguishing normal and anomalous graphs. In addition, a specific loss is introduced to guide GADTrans in highlighting the deviation between anomalous and normal graphs while underlining the similarities among normal graphs. GADTrans achieves model interpretability by delivering human-interpretable results, which are learned graph-graph relationships and crucial subgraphs. Extensive experiments on six real-world datasets verify the effectiveness and superiority of GADTrans for GLAD tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"428-441"},"PeriodicalIF":10.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1109/TKDE.2025.3617894
Hao Wu;Qu Wang;Xin Luo;Zidong Wang
A nonstandard tensor is frequently adopted to model a large-sale complex dynamic network. A Tensor Representation Learning (TRL) model enables extracting valuable knowledge form a dynamic network via learning low-dimensional representation of a target nonstandard tensor. Nevertheless, the representation learning ability of existing TRL models are limited for a nonstandard tensor due to its inability to accurately represent the specific nature of the nonstandard tensor, i.e., mode imbalance, high-dimension, and incompleteness. To address this issue, this study innovatively proposes a Mode-Aware Tucker Network-based Tensor Representation Learning (MTN-TRL) model with three-fold ideas: a) designing a mode-aware Tucker network to accurately represent the imbalanced mode of a nonstandard tensor, b) building an MTN-based high-efficient TRL model that fuses both data density-oriented modeling principle and adaptive parameters learning scheme, and c) theoretically proving the MTN-TRL model’s convergence. Extensive experiments on eight nonstandard tensors generating from real-world dynamic networks demonstrate that MTN-TRL significantly outperforms state-of-the-art models in terms of representation accuracy.
{"title":"Learning Accurate Representation to Nonstandard Tensors via a Mode-Aware Tucker Network","authors":"Hao Wu;Qu Wang;Xin Luo;Zidong Wang","doi":"10.1109/TKDE.2025.3617894","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3617894","url":null,"abstract":"A nonstandard tensor is frequently adopted to model a large-sale complex dynamic network. A Tensor Representation Learning (TRL) model enables extracting valuable knowledge form a dynamic network via learning low-dimensional representation of a target nonstandard tensor. Nevertheless, the representation learning ability of existing TRL models are limited for a nonstandard tensor due to its inability to accurately represent the specific nature of the nonstandard tensor, i.e., mode imbalance, high-dimension, and incompleteness. To address this issue, this study innovatively proposes a Mode-Aware Tucker Network-based Tensor Representation Learning (MTN-TRL) model with three-fold ideas: a) designing a mode-aware Tucker network to accurately represent the imbalanced mode of a nonstandard tensor, b) building an MTN-based high-efficient TRL model that fuses both data density-oriented modeling principle and adaptive parameters learning scheme, and c) theoretically proving the MTN-TRL model’s convergence. Extensive experiments on eight nonstandard tensors generating from real-world dynamic networks demonstrate that MTN-TRL significantly outperforms state-of-the-art models in terms of representation accuracy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 12","pages":"7272-7285"},"PeriodicalIF":10.4,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}