A great number of graph analysis algorithms involve iterative computations, which dominate the runtime. Accelerating iterative graph computations has become the key to improving the performance of graph algorithms. While numerous studies have focused on reducing the runtime of each iteration to improve efficiency, the optimization of the number of iterations is often overlooked. In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order. Our experimental results illustrate that GoGraph surpasses current state-of-the-art reordering algorithms, improving runtime by an average of 1.83× (up to 3.34×). Compared to traditional synchronous computation methods, our approach enhances the speed of iterative computations by up to 4.46×. In dynamic scenarios, incremental GoGraph can reduce end-to-end time by 43% on average (up to 48%).
{"title":"GoGraph: Accelerating Graph Processing Through Incremental Reordering","authors":"Yijie Zhou;Shufeng Gong;Feng Yao;Hanzhang Chen;Song Yu;Pengxi Liu;Yanfeng Zhang;Ge Yu;Jeffrey Xu Yu","doi":"10.1109/TKDE.2025.3623928","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623928","url":null,"abstract":"A great number of graph analysis algorithms involve iterative computations, which dominate the runtime. Accelerating iterative graph computations has become the key to improving the performance of graph algorithms. While numerous studies have focused on reducing the runtime of each iteration to improve efficiency, the optimization of the number of iterations is often overlooked. In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order. Our experimental results illustrate that GoGraph surpasses current state-of-the-art reordering algorithms, improving runtime by an average of 1.83× (up to 3.34×). Compared to traditional synchronous computation methods, our approach enhances the speed of iterative computations by up to 4.46×. In dynamic scenarios, incremental GoGraph can reduce end-to-end time by 43% on average (up to 48%).","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"366-379"},"PeriodicalIF":10.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22DOI: 10.1109/TKDE.2025.3624222
Junyou Zhu;Christian Nauck;Michael Lindner;Langzhou He;Philip S. Yu;Klaus-Robert Müller;Jürgen Kurths;Frank Hellmann
Facing climate change, the transformation to renewable energy poses stability challenges for power grids due to their reduced inertia and increased decentralization. Traditional dynamic stability assessments, crucial for safe grid operation with higher renewable shares, are computationally expensive and unsuitable for large-scale grids in the real world. Although multiple proofs in the network science have shown that network measures, which quantify the structural characteristics of networked dynamical systems, have the potential to facilitate basin stability prediction, no studies to date have demonstrated their ability to efficiently generalize to real-world grids. With recent breakthroughs in Graph Neural Networks (GNNs), we are surprised to find that there is still a lack of a common foundation about: Whether network measures can enhance GNNs’ capability to predict dynamic stability and how they might help GNNs generalize to realistic grid topologies. In this paper, we conduct, for the first time, a comprehensive analysis of 48 network measures in GNN-based stability assessments, introducing two strategies for their integration into the GNN framework. We uncover that prioritizing measures with consistent distributions across different grids as the input or regarding measures as auxiliary supervised information improves the model’s generalization ability to realistic grid topologies, even when models trained on only 20-node synthetic datasets are used. Our empirical results demonstrate a significant enhancement in model generalizability, increasing the $R^{2}$ performance from 66$%$ to 83$%$. When evaluating the probabilistic stability indices on the realistic Texan grid model, GNNs reduce the time needed from 28,950 hours (Monte Carlo sampling) to just 0.06 seconds. This study could provide fundamental insights into basin stability assessments using GNNs, setting a new benchmark for future research.
{"title":"Network Measure-Enriched GNNs: A New Framework for Power Grid Stability Prediction","authors":"Junyou Zhu;Christian Nauck;Michael Lindner;Langzhou He;Philip S. Yu;Klaus-Robert Müller;Jürgen Kurths;Frank Hellmann","doi":"10.1109/TKDE.2025.3624222","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3624222","url":null,"abstract":"Facing climate change, the transformation to renewable energy poses stability challenges for power grids due to their reduced inertia and increased decentralization. Traditional dynamic stability assessments, crucial for safe grid operation with higher renewable shares, are computationally expensive and unsuitable for large-scale grids in the real world. Although multiple proofs in the network science have shown that network measures, which quantify the structural characteristics of networked dynamical systems, have the potential to facilitate basin stability prediction, no studies to date have demonstrated their ability to efficiently generalize to real-world grids. With recent breakthroughs in Graph Neural Networks (GNNs), we are surprised to find that there is still a lack of a common foundation about: Whether network measures can enhance GNNs’ capability to predict dynamic stability and how they might help GNNs generalize to realistic grid topologies. In this paper, we conduct, for the first time, a comprehensive analysis of 48 network measures in GNN-based stability assessments, introducing two strategies for their integration into the GNN framework. We uncover that prioritizing measures with consistent distributions across different grids as the input or regarding measures as auxiliary supervised information improves the model’s generalization ability to realistic grid topologies, even when models trained on only 20-node synthetic datasets are used. Our empirical results demonstrate a significant enhancement in model generalizability, increasing the <inline-formula><tex-math>$R^{2}$</tex-math></inline-formula> performance from 66<inline-formula><tex-math>$%$</tex-math></inline-formula> to 83<inline-formula><tex-math>$%$</tex-math></inline-formula>. When evaluating the probabilistic stability indices on the realistic Texan grid model, GNNs reduce the time needed from 28,950 hours (Monte Carlo sampling) to just 0.06 seconds. This study could provide fundamental insights into basin stability assessments using GNNs, setting a new benchmark for future research.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"518-531"},"PeriodicalIF":10.4,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22DOI: 10.1109/TKDE.2025.3623941
Wenna Lai;Haoran Xie;Guandong Xu;Qing Li
Implicit sentiment analysis (ISA) presents significant challenges due to the absence of salient cue words. Previous methods have struggled with insufficient data and limited reasoning capabilities to infer underlying opinions. Integrating multi-task learning (MTL) with large language models (LLMs) offers the potential to enable models of varying sizes to reliably perceive and recognize genuine opinions in ISA. However, existing MTL approaches are constrained by two sources of uncertainty: data-level uncertainty, arising from hallucination problems in LLM-generated contextual information, and task-level uncertainty, stemming from the varying capacities of models to process contextual information. To handle these uncertainties, we propose MT-ISA, a novel MTL framework that enhances ISA by leveraging the generation and reasoning capabilities of LLMs through automatic weight learning (AWL). Specifically, MT-ISA constructs auxiliary tasks using generative LLMs to supplement sentiment elements and incorporates automatic MTL to fully exploit auxiliary data. We introduce data-level and task-level AWL, which dynamically identify relationships and prioritize more reliable data and critical tasks, enabling models of varying sizes to adaptively learn fine-grained weights based on their reasoning capabilities. Three strategies are investigated for data-level AWL, which are integrated with homoscedastic uncertainty for task-level AWL. Extensive experiments reveal that models of varying sizes achieve an optimal balance between primary prediction and auxiliary tasks in MT-ISA. This underscores the effectiveness and adaptability of our approach.
{"title":"Multi-Task Learning With LLMs for Implicit Sentiment Analysis: Data-Level and Task-Level Automatic Weight Learning","authors":"Wenna Lai;Haoran Xie;Guandong Xu;Qing Li","doi":"10.1109/TKDE.2025.3623941","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623941","url":null,"abstract":"Implicit sentiment analysis (ISA) presents significant challenges due to the absence of salient cue words. Previous methods have struggled with insufficient data and limited reasoning capabilities to infer underlying opinions. Integrating multi-task learning (MTL) with large language models (LLMs) offers the potential to enable models of varying sizes to reliably perceive and recognize genuine opinions in ISA. However, existing MTL approaches are constrained by two sources of uncertainty: <bold><i>data-level uncertainty</i></b>, arising from hallucination problems in LLM-generated contextual information, and <bold><i>task-level uncertainty</i></b>, stemming from the varying capacities of models to process contextual information. To handle these uncertainties, we propose <italic>MT-ISA</i>, a novel MTL framework that enhances ISA by leveraging the generation and reasoning capabilities of LLMs through automatic weight learning (AWL). Specifically, <italic>MT-ISA</i> constructs auxiliary tasks using generative LLMs to supplement sentiment elements and incorporates automatic MTL to fully exploit auxiliary data. We introduce data-level and task-level AWL, which dynamically identify relationships and prioritize more reliable data and critical tasks, enabling models of varying sizes to adaptively learn fine-grained weights based on their reasoning capabilities. Three strategies are investigated for data-level AWL, which are integrated with homoscedastic uncertainty for task-level AWL. Extensive experiments reveal that models of varying sizes achieve an optimal balance between primary prediction and auxiliary tasks in <italic>MT-ISA</i>. This underscores the effectiveness and adaptability of our approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"506-517"},"PeriodicalIF":10.4,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11214474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning-to-Rank (LTR) models built on Transformers have been widely adopted to achieve commendable performance in web search. However, these models predominantly emphasize relevance, often overlooking broader aspects of user satisfaction such as quality, authority, and recency, which collectively enhance the overall user experience. Addressing these multifaceted elements is essential for developing more effective and user-centric search engines. Nevertheless, training such comprehensive models remains challenging due to the scarcity of annotated query-webpage pairs relative to the vast number of webpages available online and the billions of daily search queries. Concurrently, industry research communities have released numerous open-source LTR datasets with well-annotated samples, though these datasets feature diverse designs of LTR features and labels across heterogeneous domains. Inspired by recent advancements in pre-training transformers for enhanced performance, this work explores the pre-training of LTR models using both labeled and unlabeled samples. Specifically, we leverage well-annotated samples from heterogeneous open-source LTR datasets to bolster the pre-training process and integrate multifaceted satisfaction features during the fine-tuning stage. In this paper, we propose S$^{3}$3PRank—Satisfaction-oriented Learning to Rank with Semi-supervised Pre-training. Specifically, S$^{3}$PRank employs a three-step approach: (1) it exploits unlabeled/labeled data from the search engine to pre-train a self-attentive encoder via semi-supervised learning; (2) it incorporates multiple open-source heterogeneous LTR datasets to enhance the pre-training of the relevance tower through shared parameters in cross-domain learning; (3) it integrates a satisfaction tower with the pre-trained relevance tower to form a deep two-tower aggregation structure, and fine-tunes the combination of pre-trained self-attentive encoder and the two-tower structure using search engine data with various learning strategies. To demonstrate the effectiveness of our proposed approach, we conduct extensive offline and online evaluations using real-world web traffic from Baidu Search. The comparisons against numbers of advanced baselines confirmed the advantages of S$^{3}$PRank in producing high-performance ranking models for web-scale search.
{"title":"S$^{3}$PRank: Toward Satisfaction-Oriented Learning to Rank With Semi-Supervised Pre-Training","authors":"Yuchen Li;Zhonghao Lyu;Yongqi Zhang;Hao Zhang;Tianhao Peng;Haoyi Xiong;Shuaiqiang Wang;Linghe Kong;Guihai Chen;Dawei Yin","doi":"10.1109/TKDE.2025.3623607","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623607","url":null,"abstract":"Learning-to-Rank (LTR) models built on Transformers have been widely adopted to achieve commendable performance in web search. However, these models predominantly emphasize relevance, often overlooking broader aspects of user satisfaction such as quality, authority, and recency, which collectively enhance the overall user experience. Addressing these multifaceted elements is essential for developing more effective and user-centric search engines. Nevertheless, training such comprehensive models remains challenging due to the scarcity of annotated query-webpage pairs relative to the vast number of webpages available online and the billions of daily search queries. Concurrently, industry research communities have released numerous open-source LTR datasets with well-annotated samples, though these datasets feature diverse designs of LTR features and labels across heterogeneous domains. Inspired by recent advancements in pre-training transformers for enhanced performance, this work explores the pre-training of LTR models using both labeled and unlabeled samples. Specifically, we leverage well-annotated samples from heterogeneous open-source LTR datasets to bolster the pre-training process and integrate multifaceted satisfaction features during the fine-tuning stage. In this paper, we propose <b>S<inline-formula><tex-math>$^{3}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>PRank</b>—<u>S</u>atisfaction-oriented Learning to <u>Rank</u> with <u>S</u>emi-<u>s</u>upervised <u>P</u>re-training. Specifically, S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>PRank employs a three-step approach: (1) it exploits unlabeled/labeled data from the search engine to pre-train a self-attentive encoder via semi-supervised learning; (2) it incorporates multiple open-source heterogeneous LTR datasets to enhance the pre-training of the relevance tower through shared parameters in cross-domain learning; (3) it integrates a satisfaction tower with the pre-trained relevance tower to form a deep two-tower aggregation structure, and fine-tunes the combination of pre-trained self-attentive encoder and the two-tower structure using search engine data with various learning strategies. To demonstrate the effectiveness of our proposed approach, we conduct extensive offline and online evaluations using real-world web traffic from Baidu Search. The comparisons against numbers of advanced baselines confirmed the advantages of S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>PRank in producing high-performance ranking models for web-scale search.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"559-572"},"PeriodicalIF":10.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/TKDE.2025.3622941
Xianshuang Yao;Huiyu Wang
In this paper, considering the memory capability of fractional-order reservoirs and the immunity of integer-order reservoirs, a serial-parallel fractional-integer-order echo state network(SP-FIO-ESN) model, is proposed for time series prediction. First, according to the superior adaptive capability of the variational mode decomposition(VMD), the input signal is decomposed into multiple input subsequences, and thus the internal features of the signal are extracted. Second, according to the variational mode decomposition and phase space reconstruction methods, the number of serial reservoirs and the number of parallel reservoirs of SP-FIO-ESN are determined. Third, in order to ensure the stability of SP-FIO-ESN, the sufficient stability criterion of SP-FIO-ESN is given. Meanwhile, the SP-FIO-ESN reservoir parameters are optimized based on the black-winged kite algorithm (BKA). Finally, in order to verify the effectiveness of the artificial intelligence method for different learning tasks, some numerical simulation datasets and photovoltaic/wind power generation forecasting datasets are given.
考虑到分数阶储层的记忆能力和整数阶储层的抗扰性,提出了一种用于时间序列预测的串行-并行分数阶-整数阶回声状态网络(SP-FIO-ESN)模型。首先,利用变分模态分解(VMD)优越的自适应能力,将输入信号分解成多个输入子序列,从而提取信号的内部特征;其次,根据变分模态分解和相空间重构方法,确定了sp - fi -回声状态网络的串联储层数和并联储层数;第三,为了保证sp - fi - esn的稳定性,给出了sp - fi - esn的充分稳定性判据。同时,基于黑翼风筝算法(BKA)对sp - fi - esn水库参数进行了优化。最后,为了验证人工智能方法对不同学习任务的有效性,给出了一些数值模拟数据集和光伏/风力发电预测数据集。
{"title":"Serial-Parallel Fractional-Integer-Order Echo State Network for Time Series Prediction","authors":"Xianshuang Yao;Huiyu Wang","doi":"10.1109/TKDE.2025.3622941","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622941","url":null,"abstract":"In this paper, considering the memory capability of fractional-order reservoirs and the immunity of integer-order reservoirs, a serial-parallel fractional-integer-order echo state network(SP-FIO-ESN) model, is proposed for time series prediction. First, according to the superior adaptive capability of the variational mode decomposition(VMD), the input signal is decomposed into multiple input subsequences, and thus the internal features of the signal are extracted. Second, according to the variational mode decomposition and phase space reconstruction methods, the number of serial reservoirs and the number of parallel reservoirs of SP-FIO-ESN are determined. Third, in order to ensure the stability of SP-FIO-ESN, the sufficient stability criterion of SP-FIO-ESN is given. Meanwhile, the SP-FIO-ESN reservoir parameters are optimized based on the black-winged kite algorithm (BKA). Finally, in order to verify the effectiveness of the artificial intelligence method for different learning tasks, some numerical simulation datasets and photovoltaic/wind power generation forecasting datasets are given.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"602-615"},"PeriodicalIF":10.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1109/TKDE.2025.3622998
Liang Peng;Yixuan Ye;Cheng Liu;Hangjun Che;Man-Fai Leung;Si Wu;Hau-San Wong
Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption—that connected nodes share similar class labels and should therefore be close in feature space—which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method NeuCGC that extends traditional contrastive learning by incorporating neutral pairs—node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph’s homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: 1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and 2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods.
{"title":"Trustworthy Neighborhoods Mining: Homophily-Aware Neutral Contrastive Learning for Graph Clustering","authors":"Liang Peng;Yixuan Ye;Cheng Liu;Hangjun Che;Man-Fai Leung;Si Wu;Hau-San Wong","doi":"10.1109/TKDE.2025.3622998","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622998","url":null,"abstract":"Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption—that connected nodes share similar class labels and should therefore be close in feature space—which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method NeuCGC that extends traditional contrastive learning by incorporating neutral pairs—node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph’s homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: 1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and 2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"679-693"},"PeriodicalIF":10.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1109/TKDE.2025.3622591
Xu Yang;Hongguang Zhao;Saiyu Qi;Yong Qi
Data has become a critical economic asset in recent years. To enable secure and reliable access to data assets, the combination of symmetric searchable encryption (SSE) and Hybrid-storage blockchains (HSB) offers a promising solution by storing the authenticated data structure (ADS) on-chain and encrypted data off-chain, thus enabling efficient and authenticated encrypted queries. However, existing encrypted query schemes in HSB either lack support for conjunctive queries, a commonly used and important query pattern in databases, or exhibit low query efficiency in conjunctive queries. vsChain was the first scheme to support secure and authenticated conjunctive queries in HSB but had drawbacks in terms of high query and authentication costs. To overcome these limitations, we introduce SeaCQ, a novel scheme for secure and efficient authenticated conjunctive queries. SeaCQ employs a meticulously designed two-stage authenticated query process to achieve optimal query efficiency. It also incorporates a customized double-layer authentication mechanism to ensure the correctness and completeness of query results efficiently while providing error localization. Additionally, we present an extension of SeaCQ, SeaCQ*, which is a gas-efficient version that utilizes a constant-size on-chain ADS. Our security analysis and experimental results validate the security and efficiency of the proposed schemes.
{"title":"SeaCQ: Secure and Efficient Authenticated Conjunctive Query in Hybrid-Storage Blockchains","authors":"Xu Yang;Hongguang Zhao;Saiyu Qi;Yong Qi","doi":"10.1109/TKDE.2025.3622591","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622591","url":null,"abstract":"Data has become a critical economic asset in recent years. To enable secure and reliable access to data assets, the combination of symmetric searchable encryption (SSE) and Hybrid-storage blockchains (HSB) offers a promising solution by storing the authenticated data structure (ADS) on-chain and encrypted data off-chain, thus enabling efficient and authenticated encrypted queries. However, existing encrypted query schemes in HSB either lack support for conjunctive queries, a commonly used and important query pattern in databases, or exhibit low query efficiency in conjunctive queries. vsChain was the first scheme to support secure and authenticated conjunctive queries in HSB but had drawbacks in terms of high query and authentication costs. To overcome these limitations, we introduce SeaCQ, a novel scheme for secure and efficient authenticated conjunctive queries. SeaCQ employs a meticulously designed two-stage authenticated query process to achieve optimal query efficiency. It also incorporates a customized double-layer authentication mechanism to ensure the correctness and completeness of query results efficiently while providing error localization. Additionally, we present an extension of SeaCQ, SeaCQ*, which is a gas-efficient version that utilizes a constant-size on-chain ADS. Our security analysis and experimental results validate the security and efficiency of the proposed schemes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"573-587"},"PeriodicalIF":10.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-15DOI: 10.1109/TKDE.2025.3621758
Yuan Gao;Yuchen Li;Bingsheng He;Hezhe Qiao;Guoguo Ai;Hui Yan
Graph Knowledge Distillation (GKD) has made remarkable progress in graph representation learning in recent years. Despite its great success, GKD often obeys the label-dependence manner, which heavily relies on a large number of labels. Besides, we observe that GKD encounters the issue of embedding collapse, as merely maximizing the consistency between the teacher and student is insufficient for heterophilic graphs. To tackle these challenges, we propose a Self-Supervised Distillation framework named SSD. To realize label independence, the framework is conducted based on contrastive learning. Specifically, we design a Topology Invariance Block (TIB) and a Feature Invariance Block (FIB) to distill semantic invariance from unlabeled data. Each block includes a teacher-student architecture, which is trained by a projection-based contrastive loss. To avoid embedding collapse, the loss pays attention to two critical aspects: (1) Preserving consistency maximization between the same node representations related to teacher and student (positive pairs). (2) Ensuring consistency minimization between negative pairs, which include the final teacher and final student representation pairs and hidden teacher representation pairs. Under the guidance of self-distillation in each block, TIB captures the topology invariance while FIB learns the feature invariance. Additionally, cross-distillation is applied between two blocks, allowing each block to gain additional contrastive knowledge from each other, resulting in improved feature representations and enhanced classification performance. Comprehensive experimental results on 10 datasets demonstrate that our model achieves superior performance in the node classification task. In summary, SSD offers a novel paradigm for self-supervised knowledge distillation on graph-structured data.
{"title":"SSD: Self-Supervised Distillation for Heterophilic Graph Representation Learning","authors":"Yuan Gao;Yuchen Li;Bingsheng He;Hezhe Qiao;Guoguo Ai;Hui Yan","doi":"10.1109/TKDE.2025.3621758","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621758","url":null,"abstract":"Graph Knowledge Distillation (GKD) has made remarkable progress in graph representation learning in recent years. Despite its great success, GKD often obeys the label-dependence manner, which heavily relies on a large number of labels. Besides, we observe that GKD encounters the issue of embedding collapse, as merely maximizing the consistency between the teacher and student is insufficient for heterophilic graphs. To tackle these challenges, we propose a Self-Supervised Distillation framework named SSD. To realize label independence, the framework is conducted based on contrastive learning. Specifically, we design a Topology Invariance Block (TIB) and a Feature Invariance Block (FIB) to distill semantic invariance from unlabeled data. Each block includes a teacher-student architecture, which is trained by a projection-based contrastive loss. To avoid embedding collapse, the loss pays attention to two critical aspects: (1) Preserving consistency maximization between the same node representations related to teacher and student (positive pairs). (2) Ensuring consistency minimization between negative pairs, which include the final teacher and final student representation pairs and hidden teacher representation pairs. Under the guidance of self-distillation in each block, TIB captures the topology invariance while FIB learns the feature invariance. Additionally, cross-distillation is applied between two blocks, allowing each block to gain additional contrastive knowledge from each other, resulting in improved feature representations and enhanced classification performance. Comprehensive experimental results on 10 datasets demonstrate that our model achieves superior performance in the node classification task. In summary, SSD offers a novel paradigm for self-supervised knowledge distillation on graph-structured data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"631-644"},"PeriodicalIF":10.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-15DOI: 10.1109/TKDE.2025.3621843
Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy
Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an Incremental Max-P regionalization with statistical constraints (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the FaCT algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. FaCT supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with FaCT to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.
{"title":"IMS: Incremental Max-P Regionalization With Statistical Constraints","authors":"Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy","doi":"10.1109/TKDE.2025.3621843","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621843","url":null,"abstract":"Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an <italic>Incremental Max-P regionalization with statistical constraints</i> (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the <italic>FaCT</i> algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. <italic>FaCT</i> supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with <italic>FaCT</i> to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"380-398"},"PeriodicalIF":10.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.
{"title":"Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends","authors":"Yunshi Lan;Xinyuan Li;Hanyue Du;Xuesong Lu;Ming Gao;Weining Qian;Aoying Zhou","doi":"10.1109/TKDE.2025.3621181","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621181","url":null,"abstract":"Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"659-678"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}