IEEE Transactions on Knowledge and Data Engineering最新文献_第10页

Learning Prioritized Node-Wise Message Propagation in Graph Neural Networks 学习图神经网络中的优先节点信息传播

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-08-02 DOI: 10.1109/TKDE.2024.3436909

Yao Cheng;Minjie Chen;Caihua Shan;Xiang Li

Graph neural networks (GNNs) have recently received significant attention. Learning node-wise message propagation in GNNs aims to set personalized propagation steps for different nodes in the graph. Despite the success, existing methods ignore node priority that can be reflected by node influence and heterophily. In this paper, we propose a versatile framework PriPro, which can be integrated with most existing GNN models and aim to learn prioritized node-wise message propagation in GNNs. Specifically, the framework consists of three components: a backbone GNN model, a propagation controller to determine the optimal propagation steps for nodes, and a weight controller to compute the priority scores for nodes. We design a mutually enhanced mechanism to compute node priority, optimal propagation step and label prediction. We also propose an alternative optimization strategy to learn the parameters in the backbone GNN model and two parametric controllers. We conduct extensive experiments to compare our framework with other 12 state-of-the-art competitors on 10 benchmark datasets. Experimental results show that our framework can lead to superior performance in terms of propagation strategies and node representations.

最近，图神经网络（GNN）受到了广泛关注。在图神经网络中学习节点信息传播的目的是为图中的不同节点设置个性化的传播步骤。尽管取得了成功，但现有的方法忽略了节点的优先级，而节点的优先级可以通过节点的影响力和异质性反映出来。在本文中，我们提出了一个通用框架 PriPro，它可以与大多数现有的 GNN 模型集成，旨在学习 GNN 中按节点排列的优先级信息传播。具体来说，该框架由三个部分组成：骨干 GNN 模型、确定节点最佳传播步骤的传播控制器和计算节点优先级分数的权重控制器。我们设计了一种相互增强的机制来计算节点优先级、最佳传播步骤和标签预测。我们还提出了另一种优化策略，用于学习骨干 GNN 模型中的参数和两个参数控制器。我们进行了大量实验，在 10 个基准数据集上将我们的框架与其他 12 个最先进的竞争对手进行了比较。实验结果表明，我们的框架可以在传播策略和节点表示方面带来更优越的性能。

{"title":"Learning Prioritized Node-Wise Message Propagation in Graph Neural Networks","authors":"Yao Cheng;Minjie Chen;Caihua Shan;Xiang Li","doi":"10.1109/TKDE.2024.3436909","DOIUrl":"10.1109/TKDE.2024.3436909","url":null,"abstract":"Graph neural networks (GNNs) have recently received significant attention. Learning node-wise message propagation in GNNs aims to set personalized propagation steps for different nodes in the graph. Despite the success, existing methods ignore node priority that can be reflected by node influence and heterophily. In this paper, we propose a versatile framework PriPro, which can be integrated with most existing GNN models and aim to learn prioritized node-wise message propagation in GNNs. Specifically, the framework consists of three components: a backbone GNN model, a propagation controller to determine the optimal propagation steps for nodes, and a weight controller to compute the priority scores for nodes. We design a mutually enhanced mechanism to compute node priority, optimal propagation step and label prediction. We also propose an alternative optimization strategy to learn the parameters in the backbone GNN model and two parametric controllers. We conduct extensive experiments to compare our framework with other 12 state-of-the-art competitors on 10 benchmark datasets. Experimental results show that our framework can lead to superior performance in terms of propagation strategies and node representations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8670-8681"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Representation Learning Based on Cognitive Spreading Activations 基于认知扩散激活的图形表示学习

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-08-02 DOI: 10.1109/TKDE.2024.3437781

Jie Bai;Kang Zhao;Linjing Li;Daniel Zeng;Qiudan Li;Fan Yang;Quannan Zu

Graph representation learning is an emerging area for graph analysis and inference. However, existing approaches for large-scale graphs either sample nodes in sequential walks or manipulate the adjacency matrices of graphs. The former approach can cause sampling bias against less-connected nodes, whereas the latter may suffer from sparsity that exists in many real-world graphs. To learn from structural information in a graph more efficiently and comprehensively, this paper proposes a new graph representation learning approach inspired by the cognitive model of spreading-activation mechanisms in human memory. This approach learns node embeddings by adopting a graph activation model that allows nodes to “activate” their neighbors and spread their own structural information to other nodes through the paths simultaneously. Comprehensive experiments demonstrate that the proposed model performs better than existing methods on several empirical datasets for multiple graph inference tasks. Meanwhile, the spreading-activation-based model is computationally more efficient than existing approaches–the training process converges after only a small number of iterations, and the training time is linear in the number of edges in a graph. The proposed method works for both homogeneous and heterogeneous graphs.

图表示学习是图分析和推理的一个新兴领域。然而，针对大规模图的现有方法要么在连续行走中对节点进行采样，要么处理图的邻接矩阵。前一种方法可能会对连接较少的节点造成采样偏差，而后一种方法则可能会受到许多真实世界图中存在的稀疏性的影响。为了更有效、更全面地学习图中的结构信息，本文提出了一种新的图表示学习方法，其灵感来源于人类记忆中传播激活机制的认知模型。这种方法通过采用图激活模型来学习节点嵌入，该模型允许节点 "激活 "它们的邻居，并同时通过路径将自己的结构信息传播给其他节点。综合实验证明，在多个图推理任务的经验数据集上，所提出的模型比现有方法表现更好。同时，与现有方法相比，基于扩散激活的模型计算效率更高--训练过程只需少量迭代就能收敛，而且训练时间与图中边的数量成线性关系。所提出的方法既适用于同质图，也适用于异质图。

{"title":"Graph Representation Learning Based on Cognitive Spreading Activations","authors":"Jie Bai;Kang Zhao;Linjing Li;Daniel Zeng;Qiudan Li;Fan Yang;Quannan Zu","doi":"10.1109/TKDE.2024.3437781","DOIUrl":"10.1109/TKDE.2024.3437781","url":null,"abstract":"Graph representation learning is an emerging area for graph analysis and inference. However, existing approaches for large-scale graphs either sample nodes in sequential walks or manipulate the adjacency matrices of graphs. The former approach can cause sampling bias against less-connected nodes, whereas the latter may suffer from sparsity that exists in many real-world graphs. To learn from structural information in a graph more efficiently and comprehensively, this paper proposes a new graph representation learning approach inspired by the cognitive model of spreading-activation mechanisms in human memory. This approach learns node embeddings by adopting a graph activation model that allows nodes to “activate” their neighbors and spread their own structural information to other nodes through the paths simultaneously. Comprehensive experiments demonstrate that the proposed model performs better than existing methods on several empirical datasets for multiple graph inference tasks. Meanwhile, the spreading-activation-based model is computationally more efficient than existing approaches–the training process converges after only a small number of iterations, and the training time is linear in the number of edges in a graph. The proposed method works for both homogeneous and heterogeneous graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8408-8420"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Symmetrical Self-Representation and Data-Grouping Strategy for Unsupervised Feature Selection 用于无监督特征选择的对称自代表和数据分组策略

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-08-02 DOI: 10.1109/TKDE.2024.3437364

Aihong Yuan;Mengbo You;Yuhan Wang;Xun Li;Xuelong Li

Unsupervised feature selection (UFS) is an important technology for dimensionality reduction and has gained great interest in a wide range of fields. Recently, most popular methods are spectral-based which frequently use adaptive graph constraints to promote performance. However, no literature has considered the grouping characteristic of the data features, which is the most basic and important characteristic for arbitrary data. In this paper, based on the spectral analysis method, we first simulate the data feature grouping characteristic. Then, the similarity between data is adaptively reconstructed through the similarity between groups, which can explore the more fine-grained relationship between data than the previous adaptive graph methods. In order to achieve the aforementioned goal, the local similarity matrix and the global similarity matrix are defined, and the weighted KL entropy is used to constrain the relationship between the global similarity matrix and the local similarity matrices. Furthermore, the symmetrical self-representation structure is used to improve the performance of the reconstruction error term in the conventional spectral-based methods. After the model is constructed, a simple but efficient algorithm is proposed to solve the full model. Extensive experiments on 8 benchmark dataset with different types to show the effectiveness of the proposed method.

无监督特征选择（UFS）是一项重要的降维技术，在众多领域都受到了广泛关注。最近，大多数流行的方法都是基于光谱的，这些方法经常使用自适应图约束来提高性能。然而，还没有文献考虑过数据特征的分组特征，而这是任意数据最基本、最重要的特征。本文基于光谱分析方法，首先模拟了数据特征分组特征。然后，通过分组间的相似性自适应地重建数据间的相似性，从而探索出比以往自适应图方法更精细的数据间关系。为了实现上述目标，定义了局部相似性矩阵和全局相似性矩阵，并利用加权 KL 熵来约束全局相似性矩阵和局部相似性矩阵之间的关系。此外，对称自表示结构用于改善传统基于光谱方法中重建误差项的性能。在构建模型后，提出了一种简单而高效的算法来求解完整模型。在 8 个不同类型的基准数据集上进行了大量实验，以显示所提方法的有效性。

{"title":"Symmetrical Self-Representation and Data-Grouping Strategy for Unsupervised Feature Selection","authors":"Aihong Yuan;Mengbo You;Yuhan Wang;Xun Li;Xuelong Li","doi":"10.1109/TKDE.2024.3437364","DOIUrl":"10.1109/TKDE.2024.3437364","url":null,"abstract":"<italic>Unsupervised feature selection (UFS)\u0000 is an important technology for dimensionality reduction and has gained great interest in a wide range of fields. Recently, most popular methods are spectral-based which frequently use adaptive graph constraints to promote performance. However, no literature has considered the grouping characteristic of the data features, which is the most basic and important characteristic for arbitrary data. In this paper, based on the spectral analysis method, we first simulate the data feature grouping characteristic. Then, the similarity between data is adaptively reconstructed through the similarity between groups, which can explore the more fine-grained relationship between data than the previous adaptive graph methods. In order to achieve the aforementioned goal, the local similarity matrix and the global similarity matrix are defined, and the weighted KL entropy is used to constrain the relationship between the global similarity matrix and the local similarity matrices. Furthermore, the symmetrical self-representation structure is used to improve the performance of the reconstruction error term in the conventional spectral-based methods. After the model is constructed, a simple but efficient algorithm is proposed to solve the full model. Extensive experiments on 8 benchmark dataset with different types to show the effectiveness of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9348-9360"},"PeriodicalIF":8.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Unsupervised Graph Embedding With Attributed Graph Reduction and Dual-Level Loss 利用归属图缩减和双级损失实现高效无监督图嵌入

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-31 DOI: 10.1109/TKDE.2024.3436076

Ziyang Liu;Chaokun Wang;Hao Feng;Ziyang Chen

Graph embedding aims to extract low-dimensional representation vectors, commonly referred to as embeddings, from graph data. The generated embeddings simplify subsequent data analysis and machine learning tasks. Recently, researchers have proposed the use of contrastive learning on graphs to extract node embeddings in an unsupervised manner. Although existing graph contrastive learning methods have significantly advanced this field, there is still potential for further exploration, particularly in optimizing training efficiency and enhancing embedding quality. In this paper, we propose an efficient unsupervised graph embedding method named GEARED. First, the method involves an attributed graph reduction module that converts the raw graph into a reduced graph, greatly improving model training efficiency. Second, GEARED employs a dual-level loss with adaptive scaling factors to ensure the acquisition of high-quality embeddings. Finally, we conduct a partial derivative analysis to elucidate the specific mechanisms through which GEARED is capable of generating high-quality embeddings. Extensive experimental evaluations on 14 benchmark datasets show that GEARED consistently outperforms state-of-the-art methods in terms of training efficiency and classification accuracy. For instance, GEARED achieves a training speedup of over 40 times on both the CS and Physics datasets while maintaining superior classification accuracy.

图嵌入旨在从图数据中提取低维表示向量，通常称为嵌入。生成的嵌入可以简化后续的数据分析和机器学习任务。最近，研究人员提出在图中使用对比学习，以无监督的方式提取节点嵌入。虽然现有的图对比学习方法大大推进了这一领域的发展，但仍有进一步探索的潜力，尤其是在优化训练效率和提高嵌入质量方面。在本文中，我们提出了一种名为 GEARED 的高效无监督图嵌入方法。首先，该方法包含一个归因图缩减模块，可将原始图转换为缩减图，从而大大提高模型训练效率。其次，GEARED 采用了具有自适应缩放因子的双级损失，以确保获得高质量的嵌入。最后，我们进行了偏导数分析，以阐明 GEARED 能够生成高质量嵌入的具体机制。在 14 个基准数据集上进行的广泛实验评估表明，GEARED 在训练效率和分类准确性方面始终优于最先进的方法。例如，GEARED 在 CS 和物理数据集上的训练速度提高了 40 多倍，同时保持了出色的分类准确性。

{"title":"Efficient Unsupervised Graph Embedding With Attributed Graph Reduction and Dual-Level Loss","authors":"Ziyang Liu;Chaokun Wang;Hao Feng;Ziyang Chen","doi":"10.1109/TKDE.2024.3436076","DOIUrl":"10.1109/TKDE.2024.3436076","url":null,"abstract":"Graph embedding aims to extract low-dimensional representation vectors, commonly referred to as embeddings, from graph data. The generated embeddings simplify subsequent data analysis and machine learning tasks. Recently, researchers have proposed the use of contrastive learning on graphs to extract node embeddings in an unsupervised manner. Although existing graph contrastive learning methods have significantly advanced this field, there is still potential for further exploration, particularly in optimizing \u0000<italic>training efficiency\u0000 and enhancing \u0000<italic>embedding quality\u0000. In this paper, we propose an efficient unsupervised graph embedding method named GEARED. First, the method involves an attributed graph reduction module that converts the raw graph into a reduced graph, greatly improving model training efficiency. Second, GEARED employs a dual-level loss with adaptive scaling factors to ensure the acquisition of high-quality embeddings. Finally, we conduct a partial derivative analysis to elucidate the specific mechanisms through which GEARED is capable of generating high-quality embeddings. Extensive experimental evaluations on 14 benchmark datasets show that GEARED consistently outperforms state-of-the-art methods in terms of training efficiency and classification accuracy. For instance, GEARED achieves a training speedup of over 40 times on both the CS and Physics datasets while maintaining superior classification accuracy.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8120-8134"},"PeriodicalIF":8.9,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Effective Long-Term Wind Power Forecasting: A Deep Conditional Generative Spatio-Temporal Approach 实现有效的长期风电预测：深度条件生成时空方法

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-30 DOI: 10.1109/TKDE.2024.3435859

Peiyu Yi;Zhifeng Bao;Feihu Huang;Jince Wang;Jian Peng;Linghao Zhang

Accurately forecasting long-term future wind power is critical to achieve safe power grid integration. This problem is quite challenging due to wind power's high volatility and randomness. In this paper, we propose a novel time series forecasting method, namely Deep Conditional Generative Spatio-Temporal model (DCGST), and its high accuracy is achieved by tackling two critical issues simultaneously: a proper handling of the non-stationarity of multiple wind power time series, and a fine-grained modeling of their complicated yet dynamic spatio-temporal dependencies. Specifically, we first formally define the Spatio-Temporal Concept Drift (STCD) problem of wind power, and then we propose a novel deep conditional generative model to learn probabilistic distributions of future wind power values under STCD. Three different tailored neural networks are designed for distributions parameterization, including a graph-based prior network, an attention-based recognition network, and a stochastic seq2seq-based generation network. They are able to encode the dynamic spatio-temporal dependencies of multiple wind power time series and infer one-to-many mappings for future wind power generation. Compared to existing methods, DCGST can learn better spatio-temporal representations of wind power data and learn better uncertainties of data distribution to generate future values. Comprehensive experiments on real-world datasets including the largest public turbine-level wind power dataset verify the effectiveness, efficiency, generality and scalability of our method.

准确预测未来长期风力发电量对于实现安全并网至关重要。由于风电的高波动性和随机性，这一问题相当具有挑战性。本文提出了一种新颖的时间序列预测方法，即深度条件生成时空模型（DCGST），其高精度是通过同时解决两个关键问题实现的：正确处理多个风电时间序列的非平稳性，以及对其复杂而动态的时空依赖关系进行精细建模。具体来说，我们首先正式定义了风力发电的时空概念漂移（STCD）问题，然后提出了一种新颖的深度条件生成模型，用于学习 STCD 条件下未来风力发电值的概率分布。我们为分布参数化设计了三种不同的定制神经网络，包括基于图的先验网络、基于注意力的识别网络和基于随机 seq2seq 的生成网络。它们能够对多个风力发电时间序列的动态时空依赖性进行编码，并推断出未来风力发电的一对多映射。与现有方法相比，DCGST 能更好地学习风力发电数据的时空表示，并能更好地学习数据分布的不确定性，从而生成未来值。在真实世界数据集（包括最大的公共涡轮机级风力发电数据集）上进行的综合实验验证了我们方法的有效性、高效性、通用性和可扩展性。

{"title":"Towards Effective Long-Term Wind Power Forecasting: A Deep Conditional Generative Spatio-Temporal Approach","authors":"Peiyu Yi;Zhifeng Bao;Feihu Huang;Jince Wang;Jian Peng;Linghao Zhang","doi":"10.1109/TKDE.2024.3435859","DOIUrl":"10.1109/TKDE.2024.3435859","url":null,"abstract":"Accurately forecasting long-term future wind power is critical to achieve safe power grid integration. This problem is quite challenging due to wind power's high volatility and randomness. In this paper, we propose a novel time series forecasting method, namely Deep Conditional Generative Spatio-Temporal model (DCGST), and its high accuracy is achieved by tackling two critical issues simultaneously: a proper handling of the non-stationarity of multiple wind power time series, and a fine-grained modeling of their complicated yet dynamic spatio-temporal dependencies. Specifically, we first formally define the \u0000<italic>Spatio-Temporal Concept Drift\u0000 (STCD) problem of wind power, and then we propose a novel deep conditional generative model to learn probabilistic distributions of future wind power values under STCD. Three different tailored neural networks are designed for distributions parameterization, including a graph-based prior network, an attention-based recognition network, and a stochastic seq2seq-based generation network. They are able to encode the dynamic spatio-temporal dependencies of multiple wind power time series and infer one-to-many mappings for future wind power generation. Compared to existing methods, DCGST can learn better spatio-temporal representations of wind power data and learn better uncertainties of data distribution to generate future values. Comprehensive experiments on real-world datasets including the largest public turbine-level wind power dataset verify the effectiveness, efficiency, generality and scalability of our method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9403-9417"},"PeriodicalIF":8.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reliable Node Similarity Matrix Guided Contrastive Graph Clustering 可靠的节点相似性矩阵引导的对比图聚类

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-30 DOI: 10.1109/TKDE.2024.3435887

Yunhui Liu;Xinyi Gao;Tieke He;Tao Zheng;Jianhua Zhao;Hongzhi Yin

Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by attracting positively correlated node pairs and distancing negatively correlated pairs within the representation space. Nevertheless, a significant limitation of existing methods is their inadequacy in thoroughly exploring node-wise similarity. For instance, some hypothesize that the node similarity matrix within the representation space is identical, ignoring the inherent semantic relationships among nodes. Given the fundamental role of instance similarity in clustering, our research investigates contrastive graph clustering from the perspective of the node similarity matrix. We argue that an ideal node similarity matrix within the representation space should accurately reflect the inherent semantic relationships among nodes, ensuring the preservation of semantic similarities in the learned representations. In response to this, we introduce a new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning. Our method introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse. Comprehensive experiments conducted on 8 real-world datasets affirm the efficacy of learning the node similarity matrix and the superior performance of NS4GC.

图聚类涉及将图中的节点划分为互不相交的群组，对许多后续应用具有重要意义。最近，以利用监督信息著称的对比学习在深度图聚类中取得了令人鼓舞的成果。这种方法通过在表示空间内吸引正相关的节点对和拉开负相关的节点对的距离，来促进有利节点表示的学习，从而实现聚类。然而，现有方法的一个显著局限是无法深入探索节点的相似性。例如，有些方法假设表示空间内的节点相似性矩阵是相同的，从而忽略了节点之间固有的语义关系。鉴于实例相似性在聚类中的基本作用，我们的研究从节点相似性矩阵的角度研究对比图聚类。我们认为，表征空间中理想的节点相似性矩阵应能准确反映节点间固有的语义关系，确保在学习到的表征中保留语义相似性。为此，我们引入了一个新的框架--可靠的节点相似性矩阵引导的对比图聚类（NS4GC），它能估计出表征空间内近似理想的节点相似性矩阵，从而指导表征学习。我们的方法引入了节点邻接对齐和语义感知稀疏化，确保节点相似性矩阵既准确又有效稀疏。在 8 个真实世界数据集上进行的综合实验证实了学习节点相似性矩阵的有效性和 NS4GC 的卓越性能。

{"title":"Reliable Node Similarity Matrix Guided Contrastive Graph Clustering","authors":"Yunhui Liu;Xinyi Gao;Tieke He;Tao Zheng;Jianhua Zhao;Hongzhi Yin","doi":"10.1109/TKDE.2024.3435887","DOIUrl":"10.1109/TKDE.2024.3435887","url":null,"abstract":"Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by attracting positively correlated node pairs and distancing negatively correlated pairs within the representation space. Nevertheless, a significant limitation of existing methods is their inadequacy in thoroughly exploring node-wise similarity. For instance, some hypothesize that the node similarity matrix within the representation space is identical, ignoring the inherent semantic relationships among nodes. Given the fundamental role of instance similarity in clustering, our research investigates contrastive graph clustering from the perspective of the node similarity matrix. We argue that an ideal node similarity matrix within the representation space should accurately reflect the inherent semantic relationships among nodes, ensuring the preservation of semantic similarities in the learned representations. In response to this, we introduce a new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning. Our method introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse. Comprehensive experiments conducted on 8 real-world datasets affirm the efficacy of learning the node similarity matrix and the superior performance of NS4GC.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9123-9135"},"PeriodicalIF":8.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Bidirectional Extraction-Then-Evaluation Framework for Complex Relation Extraction 复杂关系提取的 "先提取后评估 "双向框架

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-30 DOI: 10.1109/TKDE.2024.3435765

Weiyan Zhang;Jiacheng Wang;Chuang Chen;Wanpeng Lu;Wen Du;Haofen Wang;Jingping Liu;Tong Ruan

Relation extraction is an important task in the field of natural language processing. Previous works mainly focus on adopting pipeline methods or joint methods to model relation extraction in general scenarios. However, these existing methods face challenges when adapting to complex relation extraction scenarios, such as handling overlapped triplets, multiple triplets, and cross-sentence triplets. In this paper, we revisit the advantages and disadvantages of the aforementioned methods in complex relation extraction. Based on the in-depth analysis, we propose a novel two-stage bidirectional extract-then-evaluate framework named BeeRe. In the extraction stage, we first obtain the subject set, relation set, and object set. Then, we design subject- and object-oriented triplet extractors to iteratively recurrent obtain candidate triplets, ensuring high recall. In the evaluation stage, we adopt a relation-oriented triplet filter to determine subject-object pairs based on relations in triplets obtained in the first stage, ensuring high precision. We conduct extensive experiments on three public datasets to show that BeeRe achieves state-of-the-art performance in both complex and general relation extraction scenarios. Even when compared to large language models like closed-source/open-source LLMs, BeeRe still has significant performance gains.

关系提取是自然语言处理领域的一项重要任务。以往的工作主要集中在采用管道方法或联合方法对一般场景下的关系提取进行建模。然而，这些现有方法在适应复杂的关系提取场景时面临挑战，如处理重叠三元组、多重三元组和跨句三元组等。在本文中，我们重新审视了上述方法在复杂关系提取中的优缺点。在深入分析的基础上，我们提出了一种名为 BeeRe 的新颖的两阶段双向提取-评估框架。在提取阶段，我们首先获得主题集、关系集和对象集。然后，我们设计了面向主体和对象的三元组提取器，反复循环地获取候选三元组，确保高召回率。在评估阶段，我们采用面向关系的三元组过滤器，根据第一阶段获得的三元组中的关系确定主客体对，确保高精度。我们在三个公共数据集上进行了广泛的实验，结果表明，BeeRe 在复杂和一般关系提取场景中都达到了最先进的性能。即使与闭源/开源 LLM 等大型语言模型相比，BeeRe 的性能也有显著提高。

{"title":"A Bidirectional Extraction-Then-Evaluation Framework for Complex Relation Extraction","authors":"Weiyan Zhang;Jiacheng Wang;Chuang Chen;Wanpeng Lu;Wen Du;Haofen Wang;Jingping Liu;Tong Ruan","doi":"10.1109/TKDE.2024.3435765","DOIUrl":"10.1109/TKDE.2024.3435765","url":null,"abstract":"Relation extraction is an important task in the field of natural language processing. Previous works mainly focus on adopting pipeline methods or joint methods to model relation extraction in general scenarios. However, these existing methods face challenges when adapting to complex relation extraction scenarios, such as handling overlapped triplets, multiple triplets, and cross-sentence triplets. In this paper, we revisit the advantages and disadvantages of the aforementioned methods in complex relation extraction. Based on the in-depth analysis, we propose a novel two-stage bidirectional extract-then-evaluate framework named \u0000<sc>BeeRe\u0000. In the extraction stage, we first obtain the subject set, relation set, and object set. Then, we design subject- and object-oriented triplet extractors to iteratively recurrent obtain candidate triplets, ensuring high recall. In the evaluation stage, we adopt a relation-oriented triplet filter to determine subject-object pairs based on relations in triplets obtained in the first stage, ensuring high precision. We conduct extensive experiments on three public datasets to show that \u0000<sc>BeeRe\u0000 achieves state-of-the-art performance in both complex and general relation extraction scenarios. Even when compared to large language models like closed-source/open-source LLMs, \u0000<sc>BeeRe\u0000 still has significant performance gains.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7442-7454"},"PeriodicalIF":8.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task Assignment Framework for Online Car-Hailing Systems With Electric Vehicles 电动汽车在线叫车系统的任务分配框架

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-29 DOI: 10.1109/TKDE.2024.3434567

Wangze Ni;Peng Cheng;Lei Chen;Shiyu Yang

Recently, transportation-as-a-service (TaaS) becomes an increasing trend, and online taxi platforms start to apply electric vehicles to serve passengers. Since the recharging time of an electric vehicle is long and non-negligible, it is necessary to smartly arrange the recharging schedules of electric vehicles in working schedules. In order to maximize the number of served taxi-calling tasks, online taxi platforms assign electric vehicles whose remaining electric power is enough to serve the dynamically arriving taxi-calling tasks and schedule suitable idle vehicles to recharging piles to recharge. We formally define the power-aware electric vehicle assignment (PAEVA) problem to serve as many taxi-calling tasks as possible under the constraints of remaining electric power and deadline. We prove that the PAEVA problem is NP-hard. To solve PAEVA, we design a novel strategy to help arrange the schedules of electric vehicles. Specifically, the strategy requires that, in a time slot and an area gird, the ratio of the number of electric vehicles whose remaining electric power is higher than a threshold

$alpha$

to the number of predicted taxi-calling tasks should be higher than a threshold

$beta$

. We propose two approximation approaches with theoretical guarantees to adaptively determine the values of the two thresholds of the strategy. We evaluate our solutions’ effectiveness and efficiency by comprehensive experiments on real datasets.

近来，交通即服务（TaaS）成为一种趋势，在线出租车平台开始应用电动汽车为乘客提供服务。由于电动汽车的充电时间较长，且不可忽略，因此有必要在工作日程中巧妙地安排电动汽车的充电时间表。为了最大限度地完成打车任务，网约车平台会分配剩余电量足以完成动态到达的打车任务的电动汽车，并安排合适的闲置车辆到充电桩充电。我们正式定义了电力感知电动汽车分配（PAEVA）问题，即在剩余电量和截止日期的约束下，为尽可能多的叫车任务提供服务。我们证明 PAEVA 问题是 NP 难问题。为了解决 PAEVA 问题，我们设计了一种新策略来帮助安排电动汽车的时间表。具体来说，该策略要求在一个时隙和一个区域范围内，剩余电量大于阈值 $alpha$ 的电动汽车数量与预测出租车召车任务数量之比应大于阈值 $beta$ 。我们提出了两种具有理论保证的近似方法，用于自适应地确定策略的两个阈值。我们通过在真实数据集上进行综合实验来评估我们的解决方案的有效性和效率。

{"title":"Task Assignment Framework for Online Car-Hailing Systems With Electric Vehicles","authors":"Wangze Ni;Peng Cheng;Lei Chen;Shiyu Yang","doi":"10.1109/TKDE.2024.3434567","DOIUrl":"10.1109/TKDE.2024.3434567","url":null,"abstract":"Recently, transportation-as-a-service (TaaS) becomes an increasing trend, and online taxi platforms start to apply electric vehicles to serve passengers. Since the recharging time of an electric vehicle is long and non-negligible, it is necessary to smartly arrange the recharging schedules of electric vehicles in working schedules. In order to maximize the number of served taxi-calling tasks, online taxi platforms assign electric vehicles whose remaining electric power is enough to serve the dynamically arriving taxi-calling tasks and schedule suitable idle vehicles to recharging piles to recharge. We formally define the power-aware electric vehicle assignment (PAEVA) problem to serve as many taxi-calling tasks as possible under the constraints of remaining electric power and deadline. We prove that the PAEVA problem is NP-hard. To solve PAEVA, we design a novel strategy to help arrange the schedules of electric vehicles. Specifically, the strategy requires that, in a time slot and an area gird, the ratio of the number of electric vehicles whose remaining electric power is higher than a threshold \u0000<inline-formula><tex-math>$alpha$</tex-math></inline-formula>\u0000 to the number of predicted taxi-calling tasks should be higher than a threshold \u0000<inline-formula><tex-math>$beta$</tex-math></inline-formula>\u0000. We propose two approximation approaches with theoretical guarantees to adaptively determine the values of the two thresholds of the strategy. We evaluate our solutions’ effectiveness and efficiency by comprehensive experiments on real datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9361-9373"},"PeriodicalIF":8.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial-Temporal Cross-View Contrastive Pre-Training for Check-in Sequence Representation Learning 用于签到序列表征学习的时空跨视图对比预训练

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-29 DOI: 10.1109/TKDE.2024.3434565

Letian Gong;Huaiyu Wan;Shengnan Guo;Xiucheng Li;Yan Lin;Erwen Zheng;Tianyi Wang;Zeyu Zhou;Youfang Lin

The rapid growth of location-based services (LBS) has yielded massive amounts of data on human mobility. Effectively extracting meaningful representations for user-generated check-in sequences is pivotal for facilitating various downstream services. However, the user-generated check-in data are simultaneously influenced by the surrounding objective circumstances and the user's subjective intention. Specifically, the temporal uncertainty and spatial diversity exhibited in check-in data make it difficult to capture the macroscopic spatial-temporal patterns of users and to understand the semantics of user mobility activities. Furthermore, the distinct characteristics of the temporal and spatial information in check-in sequences call for an effective fusion method to incorporate these two types of information. In this paper, we propose a novel Spatial-Temporal Cross-view Contrastive Representation (STCCR) framework for check-in sequence representation learning. Specifically, STCCR addresses the above challenges by employing self-supervision from “spatial topic” and “temporal intention” views, facilitating effective fusion of spatial and temporal information at the semantic level. Besides, STCCR leverages contrastive clustering to uncover users’ shared spatial topics from diverse mobility activities, while employing angular momentum contrast to mitigate the impact of temporal uncertainty and noise. We extensively evaluate STCCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.

基于位置的服务（LBS）的快速发展产生了大量有关人类移动性的数据。有效提取用户生成的签到序列的有意义表征对于促进各种下游服务至关重要。然而，用户生成的签到数据同时受到周围客观环境和用户主观意图的影响。具体来说，签到数据在时间上的不确定性和空间上的多样性使其难以捕捉用户的宏观时空模式，也难以理解用户移动活动的语义。此外，签到序列中的时间信息和空间信息各具特色，因此需要一种有效的融合方法来整合这两类信息。在本文中，我们为签到序列表征学习提出了一种新颖的空间-时间跨视角对比表征（STCCR）框架。具体来说，STCCR 通过采用 "空间主题 "和 "时间意图 "视图的自我监督来应对上述挑战，从而促进空间和时间信息在语义层面的有效融合。此外，STCCR 利用对比聚类从不同的移动活动中发现用户的共同空间主题，同时利用角动量对比来减轻时间不确定性和噪声的影响。我们在三个真实世界的数据集上对 STCCR 进行了广泛评估，并证明了它在三个下游任务中的卓越性能。

{"title":"Spatial-Temporal Cross-View Contrastive Pre-Training for Check-in Sequence Representation Learning","authors":"Letian Gong;Huaiyu Wan;Shengnan Guo;Xiucheng Li;Yan Lin;Erwen Zheng;Tianyi Wang;Zeyu Zhou;Youfang Lin","doi":"10.1109/TKDE.2024.3434565","DOIUrl":"10.1109/TKDE.2024.3434565","url":null,"abstract":"The rapid growth of location-based services (LBS) has yielded massive amounts of data on human mobility. Effectively extracting meaningful representations for user-generated check-in sequences is pivotal for facilitating various downstream services. However, the user-generated check-in data are simultaneously influenced by the surrounding objective circumstances and the user's subjective intention. Specifically, the temporal uncertainty and spatial diversity exhibited in check-in data make it difficult to capture the macroscopic spatial-temporal patterns of users and to understand the semantics of user mobility activities. Furthermore, the distinct characteristics of the temporal and spatial information in check-in sequences call for an effective fusion method to incorporate these two types of information. In this paper, we propose a novel Spatial-Temporal Cross-view Contrastive Representation (STCCR) framework for check-in sequence representation learning. Specifically, STCCR addresses the above challenges by employing self-supervision from “spatial topic” and “temporal intention” views, facilitating effective fusion of spatial and temporal information at the semantic level. Besides, STCCR leverages contrastive clustering to uncover users’ shared spatial topics from diverse mobility activities, while employing angular momentum contrast to mitigate the impact of temporal uncertainty and noise. We extensively evaluate STCCR on three real-world datasets and demonstrate its superior performance across three downstream tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9308-9321"},"PeriodicalIF":8.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaBoost-Stacking Based on Incremental Broad Learning System 基于增量式广泛学习系统的 AdaBoost 堆叠技术

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-07-25 DOI: 10.1109/TKDE.2024.3433587

Fan Yun;Zhiwen Yu;Kaixiang Yang;C. L. Philip Chen

Due to the advantages of fast training speed and competitive performance, Broad Learning System (BLS) has been widely used for classification tasks across various domains. However, the random weight generation mechanism in BLS makes the model unstable, and the performance of BLS may be limited when dealing with some complex datasets. On the other hand, the instability of BLS brings diversity to ensemble learning, and ensemble methods can also reduce the variance and bias of the single BLS. Therefore, we propose an ensemble learning algorithm based on BLS, which includes three modules. To improve the stability and generalization ability of BLS, we utilize BLS as the base classifier in an AdaBoost framework first. Taking advantage of the incremental learning mechanism of BLS, we then propose a selective ensemble method to raise the accuracy and diversity of the BLS ensemble method. In addition, based on the former selective Adaboost framework, we suggest a hierarchical ensemble algorithm, which combines sample and feature dimensions to further improve the fitting ability of the ensemble BLS. Extensive experiments have demonstrated that the proposed method performs better than the original BLS and other state-of-the-art models, proving the effectiveness and versatility of our proposed approaches.

由于具有训练速度快、性能优越等优点，Broad Learning System（BLS）已被广泛应用于各个领域的分类任务中。然而，BLS 的随机权重生成机制使得模型不稳定，在处理一些复杂数据集时，BLS 的性能可能会受到限制。另一方面，BLS 的不稳定性也为集合学习带来了多样性，而且集合方法还能减少单一 BLS 的方差和偏差。因此，我们提出了一种基于 BLS 的集合学习算法，包括三个模块。为了提高 BLS 的稳定性和泛化能力，我们首先在 AdaBoost 框架中使用 BLS 作为基础分类器。利用 BLS 的增量学习机制，我们提出了一种选择性集合方法，以提高 BLS 集合方法的准确性和多样性。此外，在前一种选择性 Adaboost 框架的基础上，我们提出了一种分层集合算法，结合样本维度和特征维度，进一步提高了 BLS 集合的拟合能力。广泛的实验证明，所提出的方法比原始 BLS 和其他最先进的模型表现更好，证明了我们所提出的方法的有效性和通用性。

{"title":"AdaBoost-Stacking Based on Incremental Broad Learning System","authors":"Fan Yun;Zhiwen Yu;Kaixiang Yang;C. L. Philip Chen","doi":"10.1109/TKDE.2024.3433587","DOIUrl":"10.1109/TKDE.2024.3433587","url":null,"abstract":"Due to the advantages of fast training speed and competitive performance, Broad Learning System (BLS) has been widely used for classification tasks across various domains. However, the random weight generation mechanism in BLS makes the model unstable, and the performance of BLS may be limited when dealing with some complex datasets. On the other hand, the instability of BLS brings diversity to ensemble learning, and ensemble methods can also reduce the variance and bias of the single BLS. Therefore, we propose an ensemble learning algorithm based on BLS, which includes three modules. To improve the stability and generalization ability of BLS, we utilize BLS as the base classifier in an AdaBoost framework first. Taking advantage of the incremental learning mechanism of BLS, we then propose a selective ensemble method to raise the accuracy and diversity of the BLS ensemble method. In addition, based on the former selective Adaboost framework, we suggest a hierarchical ensemble algorithm, which combines sample and feature dimensions to further improve the fitting ability of the ensemble BLS. Extensive experiments have demonstrated that the proposed method performs better than the original BLS and other state-of-the-art models, proving the effectiveness and versatility of our proposed approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7585-7599"},"PeriodicalIF":8.9,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0