首页 > 最新文献

IEEE Transactions on Knowledge and Data Engineering最新文献

英文 中文
GoGraph: Accelerating Graph Processing Through Incremental Reordering GoGraph:通过增量重新排序加速图形处理
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-23 DOI: 10.1109/TKDE.2025.3623928
Yijie Zhou;Shufeng Gong;Feng Yao;Hanzhang Chen;Song Yu;Pengxi Liu;Yanfeng Zhang;Ge Yu;Jeffrey Xu Yu
A great number of graph analysis algorithms involve iterative computations, which dominate the runtime. Accelerating iterative graph computations has become the key to improving the performance of graph algorithms. While numerous studies have focused on reducing the runtime of each iteration to improve efficiency, the optimization of the number of iterations is often overlooked. In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order. Our experimental results illustrate that GoGraph surpasses current state-of-the-art reordering algorithms, improving runtime by an average of 1.83× (up to 3.34×). Compared to traditional synchronous computation methods, our approach enhances the speed of iterative computations by up to 4.46×. In dynamic scenarios, incremental GoGraph can reduce end-to-end time by 43% on average (up to 48%).
大量的图分析算法涉及迭代计算,这在运行时间上占主导地位。加速迭代图计算已成为提高图算法性能的关键。虽然许多研究都集中在减少每次迭代的运行时间以提高效率,但迭代次数的优化往往被忽视。在这项工作中,我们首先建立了顶点处理顺序和迭代次数之间的相关性,为减少迭代次数提供了机会。我们提出了一个度量函数来评价顶点处理顺序在加速迭代计算中的有效性。利用这一度量,我们提出了一种新的图重排序方法——GoGraph,它构建了一个高效的顶点处理顺序。此外,对于不断进化的图,我们进一步提出了一个度量函数来评估响应图变化的顶点处理顺序的有效性,并提供了三种可选的方法来动态调整顶点处理顺序。我们的实验结果表明,GoGraph超越了当前最先进的重新排序算法,将运行时间平均提高了1.83倍(最高可达3.34倍)。与传统的同步计算方法相比,我们的方法将迭代计算速度提高了4.46倍。在动态场景中,增量式GoGraph可以平均减少43%的端到端时间(最高可达48%)。
{"title":"GoGraph: Accelerating Graph Processing Through Incremental Reordering","authors":"Yijie Zhou;Shufeng Gong;Feng Yao;Hanzhang Chen;Song Yu;Pengxi Liu;Yanfeng Zhang;Ge Yu;Jeffrey Xu Yu","doi":"10.1109/TKDE.2025.3623928","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623928","url":null,"abstract":"A great number of graph analysis algorithms involve iterative computations, which dominate the runtime. Accelerating iterative graph computations has become the key to improving the performance of graph algorithms. While numerous studies have focused on reducing the runtime of each iteration to improve efficiency, the optimization of the number of iterations is often overlooked. In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order. Our experimental results illustrate that GoGraph surpasses current state-of-the-art reordering algorithms, improving runtime by an average of 1.83× (up to 3.34×). Compared to traditional synchronous computation methods, our approach enhances the speed of iterative computations by up to 4.46×. In dynamic scenarios, incremental GoGraph can reduce end-to-end time by 43% on average (up to 48%).","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"366-379"},"PeriodicalIF":10.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network Measure-Enriched GNNs: A New Framework for Power Grid Stability Prediction 富网络测度GNNs:电网稳定性预测的新框架
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-22 DOI: 10.1109/TKDE.2025.3624222
Junyou Zhu;Christian Nauck;Michael Lindner;Langzhou He;Philip S. Yu;Klaus-Robert Müller;Jürgen Kurths;Frank Hellmann
Facing climate change, the transformation to renewable energy poses stability challenges for power grids due to their reduced inertia and increased decentralization. Traditional dynamic stability assessments, crucial for safe grid operation with higher renewable shares, are computationally expensive and unsuitable for large-scale grids in the real world. Although multiple proofs in the network science have shown that network measures, which quantify the structural characteristics of networked dynamical systems, have the potential to facilitate basin stability prediction, no studies to date have demonstrated their ability to efficiently generalize to real-world grids. With recent breakthroughs in Graph Neural Networks (GNNs), we are surprised to find that there is still a lack of a common foundation about: Whether network measures can enhance GNNs’ capability to predict dynamic stability and how they might help GNNs generalize to realistic grid topologies. In this paper, we conduct, for the first time, a comprehensive analysis of 48 network measures in GNN-based stability assessments, introducing two strategies for their integration into the GNN framework. We uncover that prioritizing measures with consistent distributions across different grids as the input or regarding measures as auxiliary supervised information improves the model’s generalization ability to realistic grid topologies, even when models trained on only 20-node synthetic datasets are used. Our empirical results demonstrate a significant enhancement in model generalizability, increasing the $R^{2}$ performance from 66$%$ to 83$%$. When evaluating the probabilistic stability indices on the realistic Texan grid model, GNNs reduce the time needed from 28,950 hours (Monte Carlo sampling) to just 0.06 seconds. This study could provide fundamental insights into basin stability assessments using GNNs, setting a new benchmark for future research.
面对气候变化,向可再生能源的转型对电网的稳定性提出了挑战,因为它们的惯性减少了,分散化程度提高了。传统的动态稳定性评估对于具有较高可再生能源份额的电网安全运行至关重要,但计算成本高且不适合现实世界中的大规模电网。尽管网络科学中的多项证据表明,量化网络动力系统结构特征的网络测量有可能促进流域稳定性预测,但迄今为止还没有研究证明它们能够有效地推广到现实世界的电网中。随着图神经网络(gnn)最近的突破,我们惊讶地发现,网络度量是否可以增强gnn预测动态稳定性的能力,以及它们如何帮助gnn推广到现实的网格拓扑,仍然缺乏一个共同的基础。在本文中,我们首次对基于GNN的稳定性评估中的48个网络指标进行了全面分析,并介绍了将其整合到GNN框架中的两种策略。我们发现,即使只使用20个节点合成数据集训练的模型,将不同网格间一致分布的度量作为输入或将度量作为辅助监督信息的优先级也可以提高模型对实际网格拓扑的泛化能力。我们的实证结果表明,模型的可泛化性显著增强,将$R^{2}$的性能从66$%$提高到83$%$。当在现实德克萨斯网格模型上评估概率稳定性指标时,gnn将所需时间从28,950小时(蒙特卡洛采样)减少到仅0.06秒。该研究可以为使用gnn进行流域稳定性评估提供基础见解,为未来的研究设定新的基准。
{"title":"Network Measure-Enriched GNNs: A New Framework for Power Grid Stability Prediction","authors":"Junyou Zhu;Christian Nauck;Michael Lindner;Langzhou He;Philip S. Yu;Klaus-Robert Müller;Jürgen Kurths;Frank Hellmann","doi":"10.1109/TKDE.2025.3624222","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3624222","url":null,"abstract":"Facing climate change, the transformation to renewable energy poses stability challenges for power grids due to their reduced inertia and increased decentralization. Traditional dynamic stability assessments, crucial for safe grid operation with higher renewable shares, are computationally expensive and unsuitable for large-scale grids in the real world. Although multiple proofs in the network science have shown that network measures, which quantify the structural characteristics of networked dynamical systems, have the potential to facilitate basin stability prediction, no studies to date have demonstrated their ability to efficiently generalize to real-world grids. With recent breakthroughs in Graph Neural Networks (GNNs), we are surprised to find that there is still a lack of a common foundation about: Whether network measures can enhance GNNs’ capability to predict dynamic stability and how they might help GNNs generalize to realistic grid topologies. In this paper, we conduct, for the first time, a comprehensive analysis of 48 network measures in GNN-based stability assessments, introducing two strategies for their integration into the GNN framework. We uncover that prioritizing measures with consistent distributions across different grids as the input or regarding measures as auxiliary supervised information improves the model’s generalization ability to realistic grid topologies, even when models trained on only 20-node synthetic datasets are used. Our empirical results demonstrate a significant enhancement in model generalizability, increasing the <inline-formula><tex-math>$R^{2}$</tex-math></inline-formula> performance from 66<inline-formula><tex-math>$%$</tex-math></inline-formula> to 83<inline-formula><tex-math>$%$</tex-math></inline-formula>. When evaluating the probabilistic stability indices on the realistic Texan grid model, GNNs reduce the time needed from 28,950 hours (Monte Carlo sampling) to just 0.06 seconds. This study could provide fundamental insights into basin stability assessments using GNNs, setting a new benchmark for future research.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"518-531"},"PeriodicalIF":10.4,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Learning With LLMs for Implicit Sentiment Analysis: Data-Level and Task-Level Automatic Weight Learning 基于llm的多任务学习内隐情感分析:数据级和任务级自动权重学习
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-22 DOI: 10.1109/TKDE.2025.3623941
Wenna Lai;Haoran Xie;Guandong Xu;Qing Li
Implicit sentiment analysis (ISA) presents significant challenges due to the absence of salient cue words. Previous methods have struggled with insufficient data and limited reasoning capabilities to infer underlying opinions. Integrating multi-task learning (MTL) with large language models (LLMs) offers the potential to enable models of varying sizes to reliably perceive and recognize genuine opinions in ISA. However, existing MTL approaches are constrained by two sources of uncertainty: data-level uncertainty, arising from hallucination problems in LLM-generated contextual information, and task-level uncertainty, stemming from the varying capacities of models to process contextual information. To handle these uncertainties, we propose MT-ISA, a novel MTL framework that enhances ISA by leveraging the generation and reasoning capabilities of LLMs through automatic weight learning (AWL). Specifically, MT-ISA constructs auxiliary tasks using generative LLMs to supplement sentiment elements and incorporates automatic MTL to fully exploit auxiliary data. We introduce data-level and task-level AWL, which dynamically identify relationships and prioritize more reliable data and critical tasks, enabling models of varying sizes to adaptively learn fine-grained weights based on their reasoning capabilities. Three strategies are investigated for data-level AWL, which are integrated with homoscedastic uncertainty for task-level AWL. Extensive experiments reveal that models of varying sizes achieve an optimal balance between primary prediction and auxiliary tasks in MT-ISA. This underscores the effectiveness and adaptability of our approach.
内隐情感分析(ISA)由于缺乏显著的提示词而面临着巨大的挑战。以前的方法在数据不足和推理能力有限的情况下很难推断出潜在的观点。将多任务学习(MTL)与大型语言模型(llm)集成,可以使不同规模的模型可靠地感知和识别ISA中的真实意见。然而,现有的MTL方法受到两个不确定性来源的限制:数据级不确定性,源于llm生成的上下文信息中的幻觉问题,以及任务级不确定性,源于模型处理上下文信息的不同能力。为了处理这些不确定性,我们提出了MT-ISA,这是一种新的MTL框架,通过自动权重学习(AWL)利用llm的生成和推理能力来增强ISA。具体来说,MT-ISA使用生成式llm构建辅助任务来补充情感元素,并结合自动MTL来充分利用辅助数据。我们引入了数据级和任务级AWL,它们动态识别关系并确定更可靠的数据和关键任务的优先级,使不同规模的模型能够根据其推理能力自适应地学习细粒度权重。研究了数据级AWL的三种策略,并将其与任务级AWL的均方差不确定性相结合。大量的实验表明,不同大小的模型在MT-ISA中实现了主要预测和辅助任务之间的最佳平衡。这强调了我们的做法的有效性和适应性。
{"title":"Multi-Task Learning With LLMs for Implicit Sentiment Analysis: Data-Level and Task-Level Automatic Weight Learning","authors":"Wenna Lai;Haoran Xie;Guandong Xu;Qing Li","doi":"10.1109/TKDE.2025.3623941","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623941","url":null,"abstract":"Implicit sentiment analysis (ISA) presents significant challenges due to the absence of salient cue words. Previous methods have struggled with insufficient data and limited reasoning capabilities to infer underlying opinions. Integrating multi-task learning (MTL) with large language models (LLMs) offers the potential to enable models of varying sizes to reliably perceive and recognize genuine opinions in ISA. However, existing MTL approaches are constrained by two sources of uncertainty: <bold><i>data-level uncertainty</i></b>, arising from hallucination problems in LLM-generated contextual information, and <bold><i>task-level uncertainty</i></b>, stemming from the varying capacities of models to process contextual information. To handle these uncertainties, we propose <italic>MT-ISA</i>, a novel MTL framework that enhances ISA by leveraging the generation and reasoning capabilities of LLMs through automatic weight learning (AWL). Specifically, <italic>MT-ISA</i> constructs auxiliary tasks using generative LLMs to supplement sentiment elements and incorporates automatic MTL to fully exploit auxiliary data. We introduce data-level and task-level AWL, which dynamically identify relationships and prioritize more reliable data and critical tasks, enabling models of varying sizes to adaptively learn fine-grained weights based on their reasoning capabilities. Three strategies are investigated for data-level AWL, which are integrated with homoscedastic uncertainty for task-level AWL. Extensive experiments reveal that models of varying sizes achieve an optimal balance between primary prediction and auxiliary tasks in <italic>MT-ISA</i>. This underscores the effectiveness and adaptability of our approach.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"506-517"},"PeriodicalIF":10.4,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11214474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S$^{3}$PRank: Toward Satisfaction-Oriented Learning to Rank With Semi-Supervised Pre-Training S$^{3}$恶作剧:面向满意度的学习与半监督预训练排序
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-20 DOI: 10.1109/TKDE.2025.3623607
Yuchen Li;Zhonghao Lyu;Yongqi Zhang;Hao Zhang;Tianhao Peng;Haoyi Xiong;Shuaiqiang Wang;Linghe Kong;Guihai Chen;Dawei Yin
Learning-to-Rank (LTR) models built on Transformers have been widely adopted to achieve commendable performance in web search. However, these models predominantly emphasize relevance, often overlooking broader aspects of user satisfaction such as quality, authority, and recency, which collectively enhance the overall user experience. Addressing these multifaceted elements is essential for developing more effective and user-centric search engines. Nevertheless, training such comprehensive models remains challenging due to the scarcity of annotated query-webpage pairs relative to the vast number of webpages available online and the billions of daily search queries. Concurrently, industry research communities have released numerous open-source LTR datasets with well-annotated samples, though these datasets feature diverse designs of LTR features and labels across heterogeneous domains. Inspired by recent advancements in pre-training transformers for enhanced performance, this work explores the pre-training of LTR models using both labeled and unlabeled samples. Specifically, we leverage well-annotated samples from heterogeneous open-source LTR datasets to bolster the pre-training process and integrate multifaceted satisfaction features during the fine-tuning stage. In this paper, we propose S$^{3}$3PRankSatisfaction-oriented Learning to Rank with Semi-supervised Pre-training. Specifically, S$^{3}$PRank employs a three-step approach: (1) it exploits unlabeled/labeled data from the search engine to pre-train a self-attentive encoder via semi-supervised learning; (2) it incorporates multiple open-source heterogeneous LTR datasets to enhance the pre-training of the relevance tower through shared parameters in cross-domain learning; (3) it integrates a satisfaction tower with the pre-trained relevance tower to form a deep two-tower aggregation structure, and fine-tunes the combination of pre-trained self-attentive encoder and the two-tower structure using search engine data with various learning strategies. To demonstrate the effectiveness of our proposed approach, we conduct extensive offline and online evaluations using real-world web traffic from Baidu Search. The comparisons against numbers of advanced baselines confirmed the advantages of S$^{3}$PRank in producing high-performance ranking models for web-scale search.
基于transformer的学习排序(LTR)模型已被广泛采用,在网络搜索中取得了令人称道的性能。然而,这些模型主要强调相关性,通常忽略了用户满意度的更广泛方面,如质量、权威和近代性,这些方面共同增强了整体用户体验。解决这些多方面的因素对于开发更有效和以用户为中心的搜索引擎至关重要。然而,训练这样的综合模型仍然具有挑战性,因为相对于大量的在线网页和数十亿的日常搜索查询,带注释的查询-网页对的稀缺性。同时,行业研究团体已经发布了许多带有良好注释样本的开源LTR数据集,尽管这些数据集具有跨异构领域的不同LTR特征和标签设计。受最近在预训练变压器提高性能方面取得的进展的启发,这项工作探索了使用标记和未标记样本的LTR模型的预训练。具体来说,我们利用来自异构开源LTR数据集的注释良好的样本来支持预训练过程,并在微调阶段集成多方面的满意度特征。在本文中,我们提出了基于S$^{3}$ 3PRank-Satisfaction-oriented Learning的半监督预训练排序方法。具体来说,S$^{3}$恶作剧采用了三步方法:(1)利用来自搜索引擎的未标记/标记数据,通过半监督学习预训练自关注编码器;(2)结合多个开源异构LTR数据集,通过跨域学习共享参数,增强关联塔的预训练;(3)将满意塔与预训练的关联塔整合,形成深度双塔聚合结构,并利用搜索引擎数据,采用多种学习策略对预训练的自关注编码器与双塔结构的组合进行微调。为了证明我们提出的方法的有效性,我们使用b百度搜索的真实网络流量进行了广泛的离线和在线评估。与高级基线的比较证实了S$^{3}$恶作剧在为网络规模搜索生成高性能排名模型方面的优势。
{"title":"S$^{3}$PRank: Toward Satisfaction-Oriented Learning to Rank With Semi-Supervised Pre-Training","authors":"Yuchen Li;Zhonghao Lyu;Yongqi Zhang;Hao Zhang;Tianhao Peng;Haoyi Xiong;Shuaiqiang Wang;Linghe Kong;Guihai Chen;Dawei Yin","doi":"10.1109/TKDE.2025.3623607","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3623607","url":null,"abstract":"Learning-to-Rank (LTR) models built on Transformers have been widely adopted to achieve commendable performance in web search. However, these models predominantly emphasize relevance, often overlooking broader aspects of user satisfaction such as quality, authority, and recency, which collectively enhance the overall user experience. Addressing these multifaceted elements is essential for developing more effective and user-centric search engines. Nevertheless, training such comprehensive models remains challenging due to the scarcity of annotated query-webpage pairs relative to the vast number of webpages available online and the billions of daily search queries. Concurrently, industry research communities have released numerous open-source LTR datasets with well-annotated samples, though these datasets feature diverse designs of LTR features and labels across heterogeneous domains. Inspired by recent advancements in pre-training transformers for enhanced performance, this work explores the pre-training of LTR models using both labeled and unlabeled samples. Specifically, we leverage well-annotated samples from heterogeneous open-source LTR datasets to bolster the pre-training process and integrate multifaceted satisfaction features during the fine-tuning stage. In this paper, we propose <b>S<inline-formula><tex-math>$^{3}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>PRank</b>—<u>S</u>atisfaction-oriented Learning to <u>Rank</u> with <u>S</u>emi-<u>s</u>upervised <u>P</u>re-training. Specifically, S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>PRank employs a three-step approach: (1) it exploits unlabeled/labeled data from the search engine to pre-train a self-attentive encoder via semi-supervised learning; (2) it incorporates multiple open-source heterogeneous LTR datasets to enhance the pre-training of the relevance tower through shared parameters in cross-domain learning; (3) it integrates a satisfaction tower with the pre-trained relevance tower to form a deep two-tower aggregation structure, and fine-tunes the combination of pre-trained self-attentive encoder and the two-tower structure using search engine data with various learning strategies. To demonstrate the effectiveness of our proposed approach, we conduct extensive offline and online evaluations using real-world web traffic from Baidu Search. The comparisons against numbers of advanced baselines confirmed the advantages of S<inline-formula><tex-math>$^{3}$</tex-math></inline-formula>PRank in producing high-performance ranking models for web-scale search.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"559-572"},"PeriodicalIF":10.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Serial-Parallel Fractional-Integer-Order Echo State Network for Time Series Prediction 用于时间序列预测的串行-并行分数-整数阶回声状态网络
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-17 DOI: 10.1109/TKDE.2025.3622941
Xianshuang Yao;Huiyu Wang
In this paper, considering the memory capability of fractional-order reservoirs and the immunity of integer-order reservoirs, a serial-parallel fractional-integer-order echo state network(SP-FIO-ESN) model, is proposed for time series prediction. First, according to the superior adaptive capability of the variational mode decomposition(VMD), the input signal is decomposed into multiple input subsequences, and thus the internal features of the signal are extracted. Second, according to the variational mode decomposition and phase space reconstruction methods, the number of serial reservoirs and the number of parallel reservoirs of SP-FIO-ESN are determined. Third, in order to ensure the stability of SP-FIO-ESN, the sufficient stability criterion of SP-FIO-ESN is given. Meanwhile, the SP-FIO-ESN reservoir parameters are optimized based on the black-winged kite algorithm (BKA). Finally, in order to verify the effectiveness of the artificial intelligence method for different learning tasks, some numerical simulation datasets and photovoltaic/wind power generation forecasting datasets are given.
考虑到分数阶储层的记忆能力和整数阶储层的抗扰性,提出了一种用于时间序列预测的串行-并行分数阶-整数阶回声状态网络(SP-FIO-ESN)模型。首先,利用变分模态分解(VMD)优越的自适应能力,将输入信号分解成多个输入子序列,从而提取信号的内部特征;其次,根据变分模态分解和相空间重构方法,确定了sp - fi -回声状态网络的串联储层数和并联储层数;第三,为了保证sp - fi - esn的稳定性,给出了sp - fi - esn的充分稳定性判据。同时,基于黑翼风筝算法(BKA)对sp - fi - esn水库参数进行了优化。最后,为了验证人工智能方法对不同学习任务的有效性,给出了一些数值模拟数据集和光伏/风力发电预测数据集。
{"title":"Serial-Parallel Fractional-Integer-Order Echo State Network for Time Series Prediction","authors":"Xianshuang Yao;Huiyu Wang","doi":"10.1109/TKDE.2025.3622941","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622941","url":null,"abstract":"In this paper, considering the memory capability of fractional-order reservoirs and the immunity of integer-order reservoirs, a serial-parallel fractional-integer-order echo state network(SP-FIO-ESN) model, is proposed for time series prediction. First, according to the superior adaptive capability of the variational mode decomposition(VMD), the input signal is decomposed into multiple input subsequences, and thus the internal features of the signal are extracted. Second, according to the variational mode decomposition and phase space reconstruction methods, the number of serial reservoirs and the number of parallel reservoirs of SP-FIO-ESN are determined. Third, in order to ensure the stability of SP-FIO-ESN, the sufficient stability criterion of SP-FIO-ESN is given. Meanwhile, the SP-FIO-ESN reservoir parameters are optimized based on the black-winged kite algorithm (BKA). Finally, in order to verify the effectiveness of the artificial intelligence method for different learning tasks, some numerical simulation datasets and photovoltaic/wind power generation forecasting datasets are given.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"602-615"},"PeriodicalIF":10.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trustworthy Neighborhoods Mining: Homophily-Aware Neutral Contrastive Learning for Graph Clustering 可信邻域挖掘:图聚类的同构感知中性对比学习
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-17 DOI: 10.1109/TKDE.2025.3622998
Liang Peng;Yixuan Ye;Cheng Liu;Hangjun Che;Man-Fai Leung;Si Wu;Hau-San Wong
Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption—that connected nodes share similar class labels and should therefore be close in feature space—which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method NeuCGC that extends traditional contrastive learning by incorporating neutral pairs—node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph’s homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: 1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and 2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods.
近年来,为了有效地利用邻域信息进行聚类,引入了基于邻域的对比学习方法。然而,这些方法依赖于同质性假设,即连接的节点共享相似的类标签,因此在特征空间中应该是接近的,这无法解释现实世界图中不同的同质性水平。因此,将对比学习应用于低同态图可能会由于不可靠的邻域信息而导致无法区分的节点表示,这使得在图聚类中识别具有不同同态水平的可信邻域变得具有挑战性。为了解决这个问题,我们引入了一种新的邻域中立对比图聚类方法NeuCGC,它通过将中立对-节点对视为加权正对而不是严格的正或负对来扩展传统的对比学习。这些中性对根据图的同态水平动态调整,从而实现更灵活和健壮的学习过程。利用对比学习中的中性对,我们的方法包含两个关键组成部分:1)基于给定属性图的同质性水平调整的自适应对比邻域分布对齐,确保邻域分布的有效对齐;2)利用高置信度图的可靠邻域信息学习鲁棒节点表示的对比邻域节点特征一致性学习机制。减轻同质性水平变化的不利影响,有效地利用高度可信的邻域信息。实验结果证明了我们的方法的有效性和鲁棒性,优于其他最先进的图聚类方法。
{"title":"Trustworthy Neighborhoods Mining: Homophily-Aware Neutral Contrastive Learning for Graph Clustering","authors":"Liang Peng;Yixuan Ye;Cheng Liu;Hangjun Che;Man-Fai Leung;Si Wu;Hau-San Wong","doi":"10.1109/TKDE.2025.3622998","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622998","url":null,"abstract":"Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption—that connected nodes share similar class labels and should therefore be close in feature space—which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method NeuCGC that extends traditional contrastive learning by incorporating neutral pairs—node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph’s homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: 1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and 2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"679-693"},"PeriodicalIF":10.4,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeaCQ: Secure and Efficient Authenticated Conjunctive Query in Hybrid-Storage Blockchains SeaCQ:混合存储区块链中安全高效的身份验证联合查询
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-16 DOI: 10.1109/TKDE.2025.3622591
Xu Yang;Hongguang Zhao;Saiyu Qi;Yong Qi
Data has become a critical economic asset in recent years. To enable secure and reliable access to data assets, the combination of symmetric searchable encryption (SSE) and Hybrid-storage blockchains (HSB) offers a promising solution by storing the authenticated data structure (ADS) on-chain and encrypted data off-chain, thus enabling efficient and authenticated encrypted queries. However, existing encrypted query schemes in HSB either lack support for conjunctive queries, a commonly used and important query pattern in databases, or exhibit low query efficiency in conjunctive queries. vsChain was the first scheme to support secure and authenticated conjunctive queries in HSB but had drawbacks in terms of high query and authentication costs. To overcome these limitations, we introduce SeaCQ, a novel scheme for secure and efficient authenticated conjunctive queries. SeaCQ employs a meticulously designed two-stage authenticated query process to achieve optimal query efficiency. It also incorporates a customized double-layer authentication mechanism to ensure the correctness and completeness of query results efficiently while providing error localization. Additionally, we present an extension of SeaCQ, SeaCQ*, which is a gas-efficient version that utilizes a constant-size on-chain ADS. Our security analysis and experimental results validate the security and efficiency of the proposed schemes.
近年来,数据已成为一项重要的经济资产。为了实现对数据资产的安全可靠访问,对称可搜索加密(SSE)和混合存储区块链(HSB)的结合提供了一种有前途的解决方案,通过在链上存储经过验证的数据结构(ADS)和在链下存储经过验证的数据,从而实现高效和经过验证的加密查询。然而,HSB中现有的加密查询方案要么缺乏对数据库中常用且重要的查询模式——联合查询的支持,要么在联合查询中表现出较低的查询效率。vsChain是第一个在HSB中支持安全和身份验证的联合查询的方案,但在查询和身份验证成本较高方面存在缺点。为了克服这些限制,我们引入了SeaCQ,一种安全高效的认证联合查询新方案。SeaCQ采用精心设计的两阶段认证查询流程,以实现最佳查询效率。它还集成了自定义的双层身份验证机制,在提供错误定位的同时有效地确保查询结果的正确性和完整性。此外,我们提出了SeaCQ的扩展,SeaCQ*,这是一个使用恒定大小链上ADS的节能版本。我们的安全性分析和实验结果验证了所提出方案的安全性和效率。
{"title":"SeaCQ: Secure and Efficient Authenticated Conjunctive Query in Hybrid-Storage Blockchains","authors":"Xu Yang;Hongguang Zhao;Saiyu Qi;Yong Qi","doi":"10.1109/TKDE.2025.3622591","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3622591","url":null,"abstract":"Data has become a critical economic asset in recent years. To enable secure and reliable access to data assets, the combination of symmetric searchable encryption (SSE) and Hybrid-storage blockchains (HSB) offers a promising solution by storing the authenticated data structure (ADS) on-chain and encrypted data off-chain, thus enabling efficient and authenticated encrypted queries. However, existing encrypted query schemes in HSB either lack support for conjunctive queries, a commonly used and important query pattern in databases, or exhibit low query efficiency in conjunctive queries. vsChain was the first scheme to support secure and authenticated conjunctive queries in HSB but had drawbacks in terms of high query and authentication costs. To overcome these limitations, we introduce SeaCQ, a novel scheme for secure and efficient authenticated conjunctive queries. SeaCQ employs a meticulously designed two-stage authenticated query process to achieve optimal query efficiency. It also incorporates a customized double-layer authentication mechanism to ensure the correctness and completeness of query results efficiently while providing error localization. Additionally, we present an extension of SeaCQ, SeaCQ*, which is a gas-efficient version that utilizes a constant-size on-chain ADS. Our security analysis and experimental results validate the security and efficiency of the proposed schemes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"573-587"},"PeriodicalIF":10.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSD: Self-Supervised Distillation for Heterophilic Graph Representation Learning 异亲图表示学习的自监督蒸馏
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-15 DOI: 10.1109/TKDE.2025.3621758
Yuan Gao;Yuchen Li;Bingsheng He;Hezhe Qiao;Guoguo Ai;Hui Yan
Graph Knowledge Distillation (GKD) has made remarkable progress in graph representation learning in recent years. Despite its great success, GKD often obeys the label-dependence manner, which heavily relies on a large number of labels. Besides, we observe that GKD encounters the issue of embedding collapse, as merely maximizing the consistency between the teacher and student is insufficient for heterophilic graphs. To tackle these challenges, we propose a Self-Supervised Distillation framework named SSD. To realize label independence, the framework is conducted based on contrastive learning. Specifically, we design a Topology Invariance Block (TIB) and a Feature Invariance Block (FIB) to distill semantic invariance from unlabeled data. Each block includes a teacher-student architecture, which is trained by a projection-based contrastive loss. To avoid embedding collapse, the loss pays attention to two critical aspects: (1) Preserving consistency maximization between the same node representations related to teacher and student (positive pairs). (2) Ensuring consistency minimization between negative pairs, which include the final teacher and final student representation pairs and hidden teacher representation pairs. Under the guidance of self-distillation in each block, TIB captures the topology invariance while FIB learns the feature invariance. Additionally, cross-distillation is applied between two blocks, allowing each block to gain additional contrastive knowledge from each other, resulting in improved feature representations and enhanced classification performance. Comprehensive experimental results on 10 datasets demonstrate that our model achieves superior performance in the node classification task. In summary, SSD offers a novel paradigm for self-supervised knowledge distillation on graph-structured data.
近年来,图知识蒸馏(GKD)在图表示学习方面取得了显著进展。尽管取得了巨大的成功,但GKD往往遵循标签依赖的方式,严重依赖于大量的标签。此外,我们观察到GKD遇到嵌入崩溃的问题,因为仅仅最大化老师和学生之间的一致性对于异性恋图是不够的。为了应对这些挑战,我们提出了一个名为SSD的自监督蒸馏框架。为了实现标签独立性,该框架基于对比学习进行。具体来说,我们设计了拓扑不变性块(TIB)和特征不变性块(FIB)来从未标记的数据中提取语义不变性。每个块包括一个师生结构,该结构通过基于投影的对比损失进行训练。为了避免嵌入崩溃,损失关注两个关键方面:(1)保持与教师和学生(正对)相关的相同节点表示之间的一致性最大化。(2)确保负对之间的一致性最小化,负对包括最终的教师和最终的学生代表对以及隐藏的教师代表对。在每个块的自蒸馏指导下,TIB捕获拓扑不变性,FIB学习特征不变性。此外,在两个块之间应用交叉蒸馏,允许每个块从彼此获得额外的对比知识,从而改进特征表示和增强分类性能。在10个数据集上的综合实验结果表明,我们的模型在节点分类任务中取得了优异的性能。综上所述,SSD为图结构数据的自监督知识蒸馏提供了一种新的范式。
{"title":"SSD: Self-Supervised Distillation for Heterophilic Graph Representation Learning","authors":"Yuan Gao;Yuchen Li;Bingsheng He;Hezhe Qiao;Guoguo Ai;Hui Yan","doi":"10.1109/TKDE.2025.3621758","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621758","url":null,"abstract":"Graph Knowledge Distillation (GKD) has made remarkable progress in graph representation learning in recent years. Despite its great success, GKD often obeys the label-dependence manner, which heavily relies on a large number of labels. Besides, we observe that GKD encounters the issue of embedding collapse, as merely maximizing the consistency between the teacher and student is insufficient for heterophilic graphs. To tackle these challenges, we propose a Self-Supervised Distillation framework named SSD. To realize label independence, the framework is conducted based on contrastive learning. Specifically, we design a Topology Invariance Block (TIB) and a Feature Invariance Block (FIB) to distill semantic invariance from unlabeled data. Each block includes a teacher-student architecture, which is trained by a projection-based contrastive loss. To avoid embedding collapse, the loss pays attention to two critical aspects: (1) Preserving consistency maximization between the same node representations related to teacher and student (positive pairs). (2) Ensuring consistency minimization between negative pairs, which include the final teacher and final student representation pairs and hidden teacher representation pairs. Under the guidance of self-distillation in each block, TIB captures the topology invariance while FIB learns the feature invariance. Additionally, cross-distillation is applied between two blocks, allowing each block to gain additional contrastive knowledge from each other, resulting in improved feature representations and enhanced classification performance. Comprehensive experimental results on 10 datasets demonstrate that our model achieves superior performance in the node classification task. In summary, SSD offers a novel paradigm for self-supervised knowledge distillation on graph-structured data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"631-644"},"PeriodicalIF":10.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMS: Incremental Max-P Regionalization With Statistical Constraints IMS:具有统计约束的增量Max-P区划
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-15 DOI: 10.1109/TKDE.2025.3621843
Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy
Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an Incremental Max-P regionalization with statistical constraints (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the FaCT algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. FaCT supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with FaCT to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.
空间区划是将一组空间区域划分为空间上连续且均匀的区域的过程。介绍了一种具有统计约束的增量Max-P区划问题;区域化过程,支持基于统计聚合函数的丰富的用户定义约束,并支持增量更新。除了支持更丰富的约束外,它还允许用户同时使用多个约束,以显著提高现有区划文献的表现力和有效性。IMS问题是NP-hard问题,极大地丰富了现有的区域化问题。如此重大的丰富在可行性和可伸缩性方面都带来了一些挑战。为了应对这些挑战,我们提出了FaCT算法,这是一种三阶段启发式方法,与现有文献相比,它可以找到一组满足IMS约束的可行空间区域,同时支持大型数据集。当属性值或约束发生变化时,FaCT支持本地和全局增量更新。此外,我们将迭代贪心算法与事实相结合,进一步提高了IMS问题和经典的max-p区域问题的解质量。我们广泛的实验评估已经证明了我们的技术在几个真实数据集上的有效性和可扩展性。
{"title":"IMS: Incremental Max-P Regionalization With Statistical Constraints","authors":"Yunfan Kang;Yiyang Bian;Qinma Kang;Amr Magdy","doi":"10.1109/TKDE.2025.3621843","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621843","url":null,"abstract":"Spatial regionalization is the process of grouping a set of spatial areas into spatially contiguous and homogeneous regions. This paper introduces an <italic>Incremental Max-P regionalization with statistical constraints</i> (IMS) problem; a regionalization process that supports enriched user-defined constraints based on statistical aggregate functions and supports incremental updates. In addition to enabling richer constraints, it allows users to employ multiple constraints simultaneously to significantly push the expressiveness and effectiveness of the existing regionalization literature. The IMS problem is NP-hard and significantly enriches the existing regionalization problems. Such a major enrichment introduces several challenges in both feasibility and scalability. To address these challenges, we propose the <italic>FaCT</i> algorithm, a three-phase heuristic approach that finds a feasible set of spatial regions that satisfy IMS constraints while supporting large datasets compared to the existing literature. <italic>FaCT</i> supports local and global incremental updates when there are changes in attribute values or constraints. In addition, we incorporate the Iterated Greedy algorithm with <italic>FaCT</i> to further improve the solution quality of the IMS problem and the classical max-p regions problem. Our extensive experimental evaluation has demonstrated the effectiveness and scalability of our techniques on several real datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"380-398"},"PeriodicalIF":10.4,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends 用于教育的自然语言处理综述:分类、系统回顾和未来趋势
IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-14 DOI: 10.1109/TKDE.2025.3621181
Yunshi Lan;Xinyuan Li;Hanyue Du;Xuesong Lu;Ming Gao;Weining Qian;Aoying Zhou
Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.
自然语言处理(NLP)旨在通过计算机科学领域的技术分析文本或语音。它服务于医疗保健、商业、教育等领域的应用程序。特别是,自然语言处理已经广泛应用于教育领域,其应用在帮助教学和学习方面具有巨大的潜力。在这项调查中,我们回顾了NLP的最新进展,重点是解决与教育领域相关的问题。详细地说,我们首先介绍了NLP技术可以贡献的相关背景和教育中的现实场景。然后,我们提出了NLP在教育领域的分类,并重点介绍了典型的NLP应用,包括问题回答、问题构建、自动评估和错误纠正。接下来,我们将根据上述分类法说明任务定义、挑战和相应的前沿技术。特别是,由于法学硕士在各种NLP应用中的广泛使用,涉及法学硕士的方法被包括在讨论中。之后,我们将展示这个领域的一些现成的演示,这些演示是为教育工作者或研究人员设计的。最后,我们总结了未来研究的五个有希望的方向,包括学科和语言的泛化,部署基于法学硕士的教育系统,教学和学习的适应性学习,教育的可解释性以及NLP技术的伦理考虑。
{"title":"Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends","authors":"Yunshi Lan;Xinyuan Li;Hanyue Du;Xuesong Lu;Ming Gao;Weining Qian;Aoying Zhou","doi":"10.1109/TKDE.2025.3621181","DOIUrl":"https://doi.org/10.1109/TKDE.2025.3621181","url":null,"abstract":"Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves applications in the domains of healthcare, commerce, education, and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with a focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education to which NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain, which are designed for educators or researchers. At last, we conclude with five promising directions for future research, including generalization over subjects and languages, deployed LLM-based systems for education, adaptive learning for teaching and learning, interpretability for education, and ethical consideration of NLP techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"38 1","pages":"659-678"},"PeriodicalIF":10.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Knowledge and Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1