IEEE Transactions on Knowledge and Data Engineering最新文献_第4页

Maximizing Influence Query Over Indoor Trajectories

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-09 DOI: 10.1109/TKDE.2024.3514323

Jian Chen;Hong Gao;Yuhong Shi;Junle Chen;Donghua Yang;Jianzhong Li

Maximizing Influence (Max-Inf) query is a fundamental operation in spatial data management. This query returns an optimal site from a candidate set to maximize its influence. Existing work commonly focuses on outdoor spaces. In practice, however, people spend up to 87% of their daily life inside indoor spaces. The outdoor techniques fall short in indoor spaces due to the complicated topology of indoor spaces. In this paper, we formulate two indoor Max-Inf queries: Top-$k$k Probabilistic Influence Query (T$k$kPI) and Collective-$k$k Probabilistic Influence Query (C$k$kPI) taking probability and mobility factors into consideration. We propose a novel spatial index, IT-tree, which utilizes the properties of indoor venues to facilitate the indoor distance computation, and then applies a trie to further organize the trajectories with similar check-in partitions together, based on their sketch information. This structure is simple but highly effective in pruning the trajectory search space. To process T

$k$

PI efficiently, we devise subtree pruning and progressive pruning techniques to delicately filter out unnecessary trajectories based on probability bounds and the monotonicity of influence probability. For C

$k$

PI queries, which is a submodular NP-hard problem, three approximation algorithms are provided with different strategies of computing marginal influence value during the search. Through extensive experiments on several real indoor venues, we demonstrate the efficiency and effectiveness of our proposed algorithms.

{"title":"Maximizing Influence Query Over Indoor Trajectories","authors":"Jian Chen;Hong Gao;Yuhong Shi;Junle Chen;Donghua Yang;Jianzhong Li","doi":"10.1109/TKDE.2024.3514323","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3514323","url":null,"abstract":"Maximizing Influence (Max-Inf) query is a fundamental operation in spatial data management. This query returns an optimal site from a candidate set to maximize its influence. Existing work commonly focuses on outdoor spaces. In practice, however, people spend up to 87% of their daily life inside indoor spaces. The outdoor techniques fall short in indoor spaces due to the complicated topology of indoor spaces. In this paper, we formulate two indoor Max-Inf queries: Top-<inline-formula><tex-math>$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic></alternatives></inline-formula> Probabilistic Influence Query (T<inline-formula><tex-math>$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic></alternatives></inline-formula>PI) and Collective-<inline-formula><tex-math>$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic></alternatives></inline-formula> Probabilistic Influence Query (C<inline-formula><tex-math>$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic></alternatives></inline-formula>PI) taking probability and mobility factors into consideration. We propose a novel spatial index, IT-tree, which utilizes the properties of indoor venues to facilitate the indoor distance computation, and then applies a trie to further organize the trajectories with similar check-in partitions together, based on their sketch information. This structure is simple but highly effective in pruning the trajectory search space. To process T<inline-formula><tex-math>$k$</tex-math></inline-formula>PI efficiently, we devise subtree pruning and progressive pruning techniques to delicately filter out unnecessary trajectories based on probability bounds and the monotonicity of influence probability. For C<inline-formula><tex-math>$k$</tex-math></inline-formula>PI queries, which is a submodular NP-hard problem, three approximation algorithms are provided with different strategies of computing marginal influence value during the search. Through extensive experiments on several real indoor venues, we demonstrate the efficiency and effectiveness of our proposed algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1294-1310"},"PeriodicalIF":8.9,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable and Effective Graph Neural Networks via Trainable Random Walk Sampling 基于可训练随机行走抽样的可扩展和有效的图神经网络

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-09 DOI: 10.1109/TKDE.2024.3513533

Haipeng Ding;Zhewei Wei;Yuhang Ye

Graph Neural Networks (GNNs) have aroused increasing research attention for their effectiveness on graph mining tasks. However, full-batch training methods based on stochastic gradient descent (SGD) require substantial resources since all gradient-required computational processes are stored in the acceleration device. The bottleneck of storage challenges the training of classic GNNs on large-scale datasets within one acceleration device. Meanwhile, message-passing based (spatial) GNN designs usually necessitate the homophily hypothesis of the graph, which easily fails on heterophilous graphs. In this paper, we propose the random walk extension for those message-passing based GNNs, enriching them with spectral powers. We prove that our random walk sampling with appropriate correction coefficients generates an unbiased approximation of the

$K$

-order polynomial filter matrix, thus promoting the neighborhood aggregation of the central nodes. Node-wise sampling strategy and historical embedding allow the classic models to be trained with mini-batches, which extends the scalability of the basic models. To show the effectiveness of our method, we conduct a thorough experimental analysis on some frequently-used benchmarks with diverse homophily and scale. The empirical results show that our model achieves significant performance improvements in comparison with the corresponding base GNNs and some state-of-the-art baselines in node classification tasks.

图神经网络（GNNs）因其在图挖掘任务中的有效性而引起了越来越多的研究关注。然而，基于随机梯度下降（SGD）的全批训练方法需要大量的资源，因为所有需要梯度的计算过程都存储在加速设备中。存储瓶颈对经典gnn在一个加速设备内大规模数据集上的训练提出了挑战。同时，基于消息传递的（空间）GNN设计通常需要图的同态假设，这在异缘图上很容易失效。在本文中，我们提出了基于消息传递的gnn的随机漫步扩展，丰富了它们的谱幂。我们证明了我们的随机漫步抽样与适当的校正系数产生$K$阶多项式滤波器矩阵的无偏近似值，从而促进了中心节点的邻域聚集。节点智能采样策略和历史嵌入使得经典模型可以用小批量进行训练，从而扩展了基本模型的可扩展性。为了证明我们方法的有效性，我们对一些常用的具有不同同质性和尺度的基准进行了彻底的实验分析。实证结果表明，与相应的基本gnn和一些最先进的基线相比，我们的模型在节点分类任务中取得了显着的性能改进。

{"title":"Scalable and Effective Graph Neural Networks via Trainable Random Walk Sampling","authors":"Haipeng Ding;Zhewei Wei;Yuhang Ye","doi":"10.1109/TKDE.2024.3513533","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3513533","url":null,"abstract":"Graph Neural Networks (GNNs) have aroused increasing research attention for their effectiveness on graph mining tasks. However, full-batch training methods based on stochastic gradient descent (SGD) require substantial resources since all gradient-required computational processes are stored in the acceleration device. The bottleneck of storage challenges the training of classic GNNs on large-scale datasets within one acceleration device. Meanwhile, message-passing based (spatial) GNN designs usually necessitate the homophily hypothesis of the graph, which easily fails on heterophilous graphs. In this paper, we propose the random walk extension for those message-passing based GNNs, enriching them with spectral powers. We prove that our random walk sampling with appropriate correction coefficients generates an unbiased approximation of the \u0000<inline-formula><tex-math>$K$</tex-math></inline-formula>\u0000-order polynomial filter matrix, thus promoting the neighborhood aggregation of the central nodes. Node-wise sampling strategy and historical embedding allow the classic models to be trained with mini-batches, which extends the scalability of the basic models. To show the effectiveness of our method, we conduct a thorough experimental analysis on some frequently-used benchmarks with diverse homophily and scale. The empirical results show that our model achieves significant performance improvements in comparison with the corresponding base GNNs and some state-of-the-art baselines in node classification tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"896-909"},"PeriodicalIF":8.9,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial Meta Learning With Comprehensive Prior Knowledge Injection for Service Time Prediction 基于综合先验知识注入的空间元学习服务时间预测

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-09 DOI: 10.1109/TKDE.2024.3512582

Shuliang Wang;Qianyu Yang;Sijie Ruan;Cheng Long;Ye Yuan;Qi Li;Ziqiang Yuan;Jie Bao;Yu Zheng

Intelligent logistics relies on accurately predicting the service time, which is a part of time cost in the last-mile delivery. However, service time prediction (STP) is non-trivial given complex delivery circumstances, location heterogeneity, and skewed observations in space, which are not well-handled by existing solutions. In our prior work, we treat STP at each location as a learning task to keep the location heterogeneity, propose a prior knowledge-enhanced meta-learning to tackle skewed observations, and introduce a Transformer-based representation module to encode complex delivery circumstances. Maintaining the design principles of prior work, in this extended paper, we propose MetaSTP⁺. In addition to fusing the prior knowledge after the meta-learning process, MetaSTP⁺ also injects the prior knowledge before and during the meta-learning process to better tackle skewed observations. More specifically, MetaSTP⁺ completes the support set of tasks with scarce samples from other tasks based on prior knowledge and is equipped with a prior knowledge-aware historical observation encoding module to achieve those purposes accordingly. Experiments show MetaSTP⁺ outperforms the best baseline by 11.2% and 8.4% on two real-world datasets. Finally, an intelligent waybill assignment system based on MetaSTP⁺ is deployed in JD Logistics.

智能物流依赖于准确预测服务时间，这是最后一英里交付时间成本的一部分。然而，考虑到复杂的交付环境、位置异质性和空间观测偏差，现有解决方案无法很好地处理服务时间预测（STP）。在我们之前的工作中，我们将每个位置的STP视为一个学习任务，以保持位置的异质性，提出了一个先验知识增强的元学习来解决倾斜的观察，并引入了一个基于transformer的表示模块来编码复杂的交付情况。在这篇扩展的论文中，我们保留了先前工作的设计原则，提出了MetaSTP+。除了融合元学习过程之后的先验知识外，MetaSTP+还在元学习过程之前和过程中注入先验知识，以更好地解决偏差观察。更具体地说，MetaSTP+基于先验知识完成了其他任务样本稀缺的任务支持集，并配备了先验知识感知的历史观测编码模块来实现这些目的。实验表明，MetaSTP+在两个真实数据集上的性能分别比最佳基线高出11.2%和8.4%。最后，在京东物流中部署了基于MetaSTP+的智能运单分配系统。

{"title":"Spatial Meta Learning With Comprehensive Prior Knowledge Injection for Service Time Prediction","authors":"Shuliang Wang;Qianyu Yang;Sijie Ruan;Cheng Long;Ye Yuan;Qi Li;Ziqiang Yuan;Jie Bao;Yu Zheng","doi":"10.1109/TKDE.2024.3512582","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3512582","url":null,"abstract":"Intelligent logistics relies on accurately predicting the service time, which is a part of time cost in the last-mile delivery. However, service time prediction (STP) is non-trivial given complex delivery circumstances, location heterogeneity, and skewed observations in space, which are not well-handled by existing solutions. In our prior work, we treat STP at each location as a learning task to keep the location heterogeneity, propose a prior knowledge-enhanced meta-learning to tackle skewed observations, and introduce a Transformer-based representation module to encode complex delivery circumstances. Maintaining the design principles of prior work, in this extended paper, we propose MetaSTP\u0000+\u0000. In addition to fusing the prior knowledge after the meta-learning process, MetaSTP\u0000+\u0000 also injects the prior knowledge before and during the meta-learning process to better tackle skewed observations. More specifically, MetaSTP\u0000+\u0000 completes the support set of tasks with scarce samples from other tasks based on prior knowledge and is equipped with a prior knowledge-aware historical observation encoding module to achieve those purposes accordingly. Experiments show MetaSTP\u0000+\u0000 outperforms the best baseline by 11.2% and 8.4% on two real-world datasets. Finally, an intelligent waybill assignment system based on MetaSTP\u0000+\u0000 is deployed in JD Logistics.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"936-950"},"PeriodicalIF":8.9,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anchor Guided Unsupervised Domain Adaptation

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-05 DOI: 10.1109/TKDE.2024.3511714

Canyu Zhang;Feiping Nie;Rong Wang

Unsupervised domain adaptation aims to classify unlabeled data points in the target domain using labeled data points from the source domain, while the distributions of data points in two domains are different. To address this issue, we propose a novel method called the anchor guided unsupervised domain adaptation method (AGDA). We minimize distribution divergence in a latent feature subspace using the Maximum Mean Discrepancy (MMD) criterion. Unlike existing unsupervised domain adaptation methods, we introduce anchor points in the original space and impose domains data to the same anchor points rather than center points to further reduce the domain difference. We optimize the anchor-based graph in the subspace to obtain discriminative transformation matrices. This enables our model to perform better on non-Gaussian distribution than methods focusing on global structure. Furthermore, the sparse anchor-based graph reduces time complexity compared to the fully connected graph, enabling exploration of local structure. Experimental results demonstrate that our algorithm outperforms several state-of-the-art methods on various benchmark datasets.

{"title":"Anchor Guided Unsupervised Domain Adaptation","authors":"Canyu Zhang;Feiping Nie;Rong Wang","doi":"10.1109/TKDE.2024.3511714","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3511714","url":null,"abstract":"Unsupervised domain adaptation aims to classify unlabeled data points in the target domain using labeled data points from the source domain, while the distributions of data points in two domains are different. To address this issue, we propose a novel method called the anchor guided unsupervised domain adaptation method (AGDA). We minimize distribution divergence in a latent feature subspace using the Maximum Mean Discrepancy (MMD) criterion. Unlike existing unsupervised domain adaptation methods, we introduce anchor points in the original space and impose domains data to the same anchor points rather than center points to further reduce the domain difference. We optimize the anchor-based graph in the subspace to obtain discriminative transformation matrices. This enables our model to perform better on non-Gaussian distribution than methods focusing on global structure. Furthermore, the sparse anchor-based graph reduces time complexity compared to the fully connected graph, enabling exploration of local structure. Experimental results demonstrate that our algorithm outperforms several state-of-the-art methods on various benchmark datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1079-1090"},"PeriodicalIF":8.9,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unified Multi-Scenario Summarization Evaluation and Explanation 统一多场景总结评价与解释

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-05 DOI: 10.1109/TKDE.2024.3509715

Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao

Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) reference-based: evaluating with human-labeled reference summary; (2) reference-free: evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose Unified Multi-scenario Summarization Evaluator (UMSE) and Multi-Agent Summarization Evaluation Explainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.

摘要质量评价是文本摘要中的一项重要工作。现有的评价方法主要分为两大类：(1)基于参考文献的评价方法：利用人工标注的参考文献摘要进行评价；(2)无参考：评价文件摘要的一致性。最近的研究主要集中在这些场景之一，并探索训练神经模型以符合人类标准并最终给出数字分数。然而，来自不同场景的模型是单独优化的，这可能会导致性能次优，因为它们忽略了不同场景之间的共享知识。此外，为每个场景设计单独的模型给用户带来了不便。此外，仅向用户提供数字质量评价分数，并不能帮助用户改进总结模型，因为用户不知道分数低的原因。受此启发，我们提出了统一多场景摘要评估器（UMSE）和多智能体摘要评估解释器（MASEE）。更具体地说，我们提出了一种扰动前缀调优方法来共享场景之间的跨场景知识，并使用自监督训练范式来优化模型，而无需额外的人工标记。我们的UMSE是第一个统一的总结评估框架，具有在三个评估场景中使用的能力。我们提出了一种多智能体摘要评价解释方法MASEE，该方法使用几个基于llm的智能体从四个不同的方面生成详细的自然语言解释。在基准数据集SummEval上的三个典型场景的实验结果表明，我们的UMSE可以与专门为每个场景设计的几种现有的强方法实现相当的性能。大量的定量和定性实验也证明了我们提出的解释方法的有效性，它可以产生一致和准确的解释。

{"title":"Unified Multi-Scenario Summarization Evaluation and Explanation","authors":"Shuo Shang;Zhitao Yao;Hao Fu;Chongyang Tao;Xiuying Chen;Feng Wang;Yongbo Wang;Zhaochun Ren;Shen Gao","doi":"10.1109/TKDE.2024.3509715","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3509715","url":null,"abstract":"Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) \u0000<italic>reference-based:\u0000 evaluating with human-labeled reference summary; (2) \u0000<italic>reference-free:\u0000 evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models to align with human criteria and finally give a numeric score. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Moreover, only providing the numeric quality evaluation score for users cannot help users to improve the summarization model, since they do not know why the score is low. Inspired by this, we propose \u0000<bold>U\u0000nified \u0000<bold>M\u0000ulti-scenario \u0000<bold>S\u0000ummarization \u0000<bold>E\u0000valuator (UMSE) and \u0000<bold>M\u0000ulti-\u0000<bold>A\u0000gent \u0000<bold>S\u0000ummarization \u0000<bold>E\u0000valuation \u0000<bold>E\u0000xplainer (MASEE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. We propose a multi-agent summary evaluation explanation method MASEE, which employs several LLM-based agents to generate detailed natural language explanations in four different aspects. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods that are specifically designed for each scenario. And intensive quantitative and qualitative experiments also demonstrate the effectiveness of our proposed explanation method, which can generate consistent and accurate explanations.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"991-1003"},"PeriodicalIF":8.9,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting and Analyzing Motifs in Large-Scale Online Transaction Networks 大规模在线交易网络中基元的检测与分析

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-04 DOI: 10.1109/TKDE.2024.3511136

Jiawei Jiang;Hao Huang;Zhigao Zheng;Yi Wei;Fangcheng Fu;Xiaosen Li;Bin Cui

Motif detection is a graph algorithm that detects certain local structures in a graph. Although network motif has been studied in graph analytics, e.g., social network and biological network, it is yet unclear whether network motif is useful for analyzing online transaction network that is generated in applications such as instant messaging and e-commerce. In an online transaction network, each vertex represents a user’s account and each edge represents a money transaction between two users. In this work, we try to analyze online transaction networks with network motifs. We design motif-based vertex embedding that integrates motif counts and centrality measurements. Furthermore, we design a distributed framework to detect motifs in large-scale online transaction networks. Our framework obtains the edge directions using a bi-directional tagging method and avoids redundant detection with a reduced view of neighboring vertices. We implement the proposed framework under the parameter server architecture. In the evaluation, we analyze different kinds of online transaction networks w.r.t the distribution of motifs and evaluate the effectiveness of motif-based embedding in downstream graph analytical tasks. The experimental results also show that our proposed motif detection framework can efficiently handle large-scale graphs.

基序检测是一种检测图中某些局部结构的图算法。虽然网络基序已经在图形分析中进行了研究，例如社会网络和生物网络，但网络基序是否有助于分析即时通讯和电子商务等应用中产生的在线交易网络尚不清楚。在在线交易网络中，每个顶点代表一个用户的账户，每个边代表两个用户之间的货币交易。在这项工作中，我们试图分析具有网络主题的在线交易网络。我们设计了基于图案的顶点嵌入，集成了图案计数和中心性测量。此外，我们设计了一个分布式框架来检测大规模在线交易网络中的motif。我们的框架使用双向标记方法获得边缘方向，并通过减少相邻顶点的视图来避免冗余检测。我们在参数服务器架构下实现了所提出的框架。在评估中，我们分析了不同类型的在线交易网络中基元的分布，并评估了基于基元的嵌入在下游图分析任务中的有效性。实验结果也表明，本文提出的基序检测框架能够有效地处理大规模图。

{"title":"Detecting and Analyzing Motifs in Large-Scale Online Transaction Networks","authors":"Jiawei Jiang;Hao Huang;Zhigao Zheng;Yi Wei;Fangcheng Fu;Xiaosen Li;Bin Cui","doi":"10.1109/TKDE.2024.3511136","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3511136","url":null,"abstract":"Motif detection is a graph algorithm that detects certain local structures in a graph. Although network motif has been studied in graph analytics, e.g., social network and biological network, it is yet unclear whether network motif is useful for analyzing \u0000<italic>online transaction network\u0000 that is generated in applications such as instant messaging and e-commerce. In an online transaction network, each vertex represents a user’s account and each edge represents a money transaction between two users. In this work, we try to analyze online transaction networks with network motifs. We design motif-based vertex embedding that integrates motif counts and centrality measurements. Furthermore, we design a distributed framework to detect motifs in large-scale online transaction networks. Our framework obtains the edge directions using a bi-directional tagging method and avoids redundant detection with a reduced view of neighboring vertices. We implement the proposed framework under the parameter server architecture. In the evaluation, we analyze different kinds of online transaction networks w.r.t the distribution of motifs and evaluate the effectiveness of motif-based embedding in downstream graph analytical tasks. The experimental results also show that our proposed motif detection framework can efficiently handle large-scale graphs.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"584-596"},"PeriodicalIF":8.9,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ATPF: An Adaptive Temporal Perturbation Framework for Adversarial Attacks on Temporal Knowledge Graph

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-04 DOI: 10.1109/TKDE.2024.3510689

Longquan Liao;Linjiang Zheng;Jiaxing Shang;Xu Li;Fengwen Chen

Robustness is paramount for ensuring the reliability of knowledge graph models in safety-sensitive applications. While recent research has delved into adversarial attacks on static knowledge graph models, the exploration of more practical temporal knowledge graphs has been largely overlooked. To fill this gap, we present the Adaptive Temporal Perturbation Framework (ATPF), a novel adversarial attack framework aimed at probing the robustness of temporal knowledge graph (TKG) models. The general idea of ATPF is to inject perturbations into the victim model input to undermine the prediction. First, we propose the Temporal Perturbation Prioritization (TPP) algorithm, which identifies the optimal time sequence for perturbation injection before initiating attacks. Subsequently, we design the Rank-Based Edge Manipulation (RBEM) algorithm, enabling the generation of both edge addition and removal perturbations under black-box setting. With ATPF, we present two adversarial attack methods: the stringent ATPF-hard and the more lenient ATPF-soft, each imposing different perturbation constraints. Our experimental evaluations on the link prediction task for TKGs demonstrate the superior attack performance of our methods compared to baseline methods. Furthermore, we find that strategically placing a single perturbation often suffices to successfully compromise a target link.

{"title":"ATPF: An Adaptive Temporal Perturbation Framework for Adversarial Attacks on Temporal Knowledge Graph","authors":"Longquan Liao;Linjiang Zheng;Jiaxing Shang;Xu Li;Fengwen Chen","doi":"10.1109/TKDE.2024.3510689","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3510689","url":null,"abstract":"Robustness is paramount for ensuring the reliability of knowledge graph models in safety-sensitive applications. While recent research has delved into adversarial attacks on static knowledge graph models, the exploration of more practical temporal knowledge graphs has been largely overlooked. To fill this gap, we present the Adaptive Temporal Perturbation Framework (ATPF), a novel adversarial attack framework aimed at probing the robustness of temporal knowledge graph (TKG) models. The general idea of ATPF is to inject perturbations into the victim model input to undermine the prediction. First, we propose the Temporal Perturbation Prioritization (TPP) algorithm, which identifies the optimal time sequence for perturbation injection before initiating attacks. Subsequently, we design the Rank-Based Edge Manipulation (RBEM) algorithm, enabling the generation of both edge addition and removal perturbations under black-box setting. With ATPF, we present two adversarial attack methods: the stringent ATPF-hard and the more lenient ATPF-soft, each imposing different perturbation constraints. Our experimental evaluations on the link prediction task for TKGs demonstrate the superior attack performance of our methods compared to baseline methods. Furthermore, we find that strategically placing a single perturbation often suffices to successfully compromise a target link.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1091-1104"},"PeriodicalIF":8.9,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams Gecko：在大数据流上高效滑动窗口聚合与基于颗粒的批量驱逐

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-04 DOI: 10.1109/TKDE.2024.3511334

Jianjun Li;Yuhui Deng;Jiande Huang;Yi Zhou;Qifen Yang;Geyong Min

Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of

$O(1)$

for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose Gecko - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.

滑动窗口聚合是流分析的核心操作，它从数据流中提取摘要。虽然现有的滑动窗口算法执行单次移除和插入操作可以实现有序流的最坏情况时间复杂度为$O(1)$，但现实世界的数据流通常涉及无序数据并表现出突发数据特征，这对这些滑动窗口算法提出了性能挑战。为了解决这个具有挑战性的问题，我们提出了Gecko——一种新颖的滑动窗口聚合算法，支持批量驱逐。Gecko利用基于粒度的清除策略来处理各种批量大小的数据，实现高效的批量清除，同时保持与单次清除的有序流算法相近的性能。对于大数据块，Gecko在块级别执行粗粒度的清除，然后使用左二叉树聚合（LTA）作为补充方法执行细粒度的清除。此外，Gecko基于块对数据进行分区，以防止乱序数据对其他块的影响，从而实现对乱序数据流的有效处理。我们进行了大量的实验来评估壁虎的性能。实验结果表明，壁虎解决方案的性能优于其他解决方案，这与理论预期一致。在实际数据场景中，Gecko将最先进的算法b_FiBA的平均吞吐量提高了1.7倍，最高可提高3.5倍。Gecko在所有比较方案中也表现出最佳的延迟性能。

{"title":"Gecko: Efficient Sliding Window Aggregation With Granular-Based Bulk Eviction Over Big Data Streams","authors":"Jianjun Li;Yuhui Deng;Jiande Huang;Yi Zhou;Qifen Yang;Geyong Min","doi":"10.1109/TKDE.2024.3511334","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3511334","url":null,"abstract":"Sliding window aggregation, which extracts summaries from data streams, is a core operation in streaming analysis. Though existing sliding window algorithms that perform single eviction and insertion operations can achieve a worst-case time complexity of \u0000<inline-formula><tex-math>$O(1)$</tex-math></inline-formula>\u0000 for in-order streams, real-world data streams often involve out-of-order data and exhibit burst data characteristics, which pose performance challenges to these sliding window algorithms. To address this challenging issue, we propose \u0000Gecko\u0000 - a novel sliding window aggregation algorithm that supports bulk eviction. Gecko leverages a granular-based eviction strategy for various bulk sizes, enabling efficient bulk eviction while maintaining the performance close to that of in-order stream algorithms for single evictions. For large data bulks, Gecko performs coarse-grained eviction at the chunk level, followed by fine-grained eviction using leftward binary tree aggregation (LTA) as a complementary method. Moreover, Gecko partitions data based on chunks to prevent the impacts of out-of-order data on other chunks, thereby enabling efficient handling of out-of-order data streams. We conduct extensive experiments to evaluate the performance of Gecko. Experimental results demonstrate that Gecko exhibits superior performance over other solutions, which is consistent with theoretical expectations. In real-world data scenarios, Gecko improves the average throughput of the state-of-the-art algorithm b_FiBA by 1.7 times, with a maximum improvement of up to 3.5 times. Gecko also demonstrates the best latency performance among all compared schemes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"698-709"},"PeriodicalIF":8.9,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequential Causal Effect Estimation by Jointly Modeling the Unmeasured Confounders and Instrumental Variables 用未测量混杂因素和工具变量联合建模的顺序因果效应估计

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-04 DOI: 10.1109/TKDE.2024.3510734

Zexu Sun;Bowei He;Shiqi Shen;Zhipeng Wang;Zhi Gong;Chen Ma;Qi Qi;Xu Chen

Sequential causal effect estimation has recently attracted increasing attention from research and industry. While the existing models have achieved many successes, there are still many limitations. Existing models usually assume the causal graphs to be sufficient, i.e., there are no latent factors, such as the unmeasured confounders and instrumental variables. However, in real-world scenarios, it is hard to record all of the factors in the observational data, which makes the causally sufficient assumptions not hold. Moreover, existing models mainly focus on discrete treatments rather than continuous ones. To alleviate the above problems, in this paper, we propose a novel Continous Causal Model by explicitly capturing the Latent Factors (called C

$^{2}$

2

M-LF for short). Specifically, we define a sequential causal graph by simultaneously considering the unmeasured confounders and instrumental variables. Second, we describe the independence that should be satisfied among different variables from the mutual information perspective and further propose our learning objective. Then, we reweight different samples in the continuous treatment space to optimize our model unbiasedly. Beyond the above designs, we also theoretically analyze our model’s causal identifiability and unbiasedness. Finally, we conduct extensive experiments on both simulation and real-world datasets to demonstrate the effectiveness of our proposed model.

序列因果效应估计近年来越来越受到学术界和工业界的关注。虽然现有的模型取得了许多成功，但仍然存在许多局限性。现有模型通常假设因果图是充分的，即不存在潜在因素，如未测量的混杂因素和工具变量。然而，在现实世界中，很难记录观测数据中的所有因素，这使得因果充分的假设不成立。此外，现有的模型主要侧重于离散处理，而不是连续处理。为了缓解上述问题，本文通过明确捕获潜在因素（简称C$^{2}$ 200 - lf），提出了一种新的连续因果模型。具体来说，我们通过同时考虑未测量的混杂因素和工具变量来定义顺序因果图。其次，我们从互信息的角度描述了不同变量之间应满足的独立性，并进一步提出了我们的学习目标。然后，我们在连续处理空间中对不同样本进行重加权，以无偏地优化我们的模型。除了上述设计，我们还从理论上分析了我们的模型的因果可识别性和无偏性。最后，我们在模拟和现实世界的数据集上进行了广泛的实验，以证明我们提出的模型的有效性。

{"title":"Sequential Causal Effect Estimation by Jointly Modeling the Unmeasured Confounders and Instrumental Variables","authors":"Zexu Sun;Bowei He;Shiqi Shen;Zhipeng Wang;Zhi Gong;Chen Ma;Qi Qi;Xu Chen","doi":"10.1109/TKDE.2024.3510734","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3510734","url":null,"abstract":"Sequential causal effect estimation has recently attracted increasing attention from research and industry. While the existing models have achieved many successes, there are still many limitations. Existing models usually assume the causal graphs to be sufficient, i.e., there are no latent factors, such as the unmeasured confounders and instrumental variables. However, in real-world scenarios, it is hard to record all of the factors in the observational data, which makes the causally sufficient assumptions not hold. Moreover, existing models mainly focus on discrete treatments rather than continuous ones. To alleviate the above problems, in this paper, we propose a novel \u0000<bold>C\u0000ontinous \u0000<bold>C\u0000ausal \u0000<bold>M\u0000odel by explicitly capturing the \u0000<bold>L\u0000atent \u0000<bold>F\u0000actors (called \u0000<bold>C<inline-formula><tex-math>$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>M-LF\u0000 for short). Specifically, we define a sequential causal graph by simultaneously considering the unmeasured confounders and instrumental variables. Second, we describe the independence that should be satisfied among different variables from the mutual information perspective and further propose our learning objective. Then, we reweight different samples in the continuous treatment space to optimize our model unbiasedly. Beyond the above designs, we also theoretically analyze our model’s causal identifiability and unbiasedness. Finally, we conduct extensive experiments on both simulation and real-world datasets to demonstrate the effectiveness of our proposed model.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"910-922"},"PeriodicalIF":8.9,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MC$^{2}$2LS: Towards Efficient Collective Location Selection in Competition MC$^ b0 $2LS：在竞争中实现有效的集体选址

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering

Pub Date : 2024-12-02 DOI: 10.1109/TKDE.2024.3510100

Meng Wang;Mengfei Zhao;Hui Li;Jiangtao Cui;Bo Yang;Tao Xue

Collective Location Selection (CLS) has received significant research attention in the spatial database community due to its wide range of applications. The CLS problem selects a group of k preferred locations among candidate sites to establish facilities, aimed at collectively attracting the maximum number of users. Existing studies commonly assume every user is located in a fixed position, without considering the competition between peer facilities. Unfortunately, in real markets, users are mobile and choose to patronize from a host of competitors, making traditional techniques unavailable. To this end, this paper presents the first effort on a CLS problem in competition scenarios, called mc

$^{2}$

2

ls, taking into account the mobility factor. Solving mc

$^{2}$

2

ls is a non-trivial task due to its NP-hardness. To overcome the challenge of pruning multi-point users with highly overlapped minimum boundary rectangles (MBRs), we exploit a position count threshold and design two square-based pruning rules. We introduce IQuad-tree, a user-MBR-free index, to benefit the hierarchical and batch-wise properties of the pruning rules. We propose an

$(1-frac{1}{e})$

-approximate greedy solution to mc

$^{2}$

2

ls and incorporate a candidate-pruning strategy to further accelerate the computation for handling skewed datasets. Extensive experiments are conducted on real datasets, demonstrating the superiority of our proposed pruning rules and solution compared to the state-of-the-art techniques.

集体区位选择（CLS）因其广泛的应用而受到空间数据库界的广泛关注。CLS问题在候选站点中选择一组k个首选位置来建立设施，旨在共同吸引最大数量的用户。现有研究通常假设每个用户都位于固定位置，而没有考虑对等设施之间的竞争。不幸的是，在现实市场中，用户是移动的，他们选择从众多竞争对手那里购买商品，这使得传统技术无法使用。为此，本文首次提出了考虑移动性因素的竞争情景下的CLS问题，称为mc$^{2}$2ls。由于mc$^{2}$2ls的np -硬度，求解它是一个不平凡的任务。为了克服具有高度重叠的最小边界矩形（mbr）的多点用户剪枝的挑战，我们利用位置计数阈值并设计了两个基于平方的剪枝规则。我们引入了一种无需用户mbr的索引idad -tree，以利用修剪规则的分层和批处理特性。我们提出了mc$^{2}$2ls的$(1-frac{1}{e})$-近似贪婪解，并结合了一个候选剪枝策略来进一步加速处理倾斜数据集的计算。在真实数据集上进行了大量的实验，证明了与最先进的技术相比，我们提出的修剪规则和解决方案的优越性。

{"title":"MC$^{2}$2LS: Towards Efficient Collective Location Selection in Competition","authors":"Meng Wang;Mengfei Zhao;Hui Li;Jiangtao Cui;Bo Yang;Tao Xue","doi":"10.1109/TKDE.2024.3510100","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3510100","url":null,"abstract":"Collective Location Selection (CLS) has received significant research attention in the spatial database community due to its wide range of applications. The CLS problem selects a group of \u0000k\u0000 preferred locations among candidate sites to establish facilities, aimed at collectively attracting the maximum number of users. Existing studies commonly assume every user is located in a fixed position, without considering the competition between peer facilities. Unfortunately, in real markets, users are mobile and choose to patronize from a host of competitors, making traditional techniques unavailable. To this end, this paper presents the first effort on a CLS problem in competition scenarios, called \u0000mc<inline-formula><tex-math>$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>ls\u0000, taking into account the mobility factor. Solving \u0000mc<inline-formula><tex-math>$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>ls\u0000 is a non-trivial task due to its NP-hardness. To overcome the challenge of pruning multi-point users with highly overlapped minimum boundary rectangles (MBRs), we exploit a position count threshold and design two square-based pruning rules. We introduce IQuad-tree, a user-MBR-free index, to benefit the hierarchical and batch-wise properties of the pruning rules. We propose an \u0000<inline-formula><tex-math>$(1-frac{1}{e})$</tex-math></inline-formula>\u0000-approximate greedy solution to \u0000mc<inline-formula><tex-math>$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic></alternatives></inline-formula>ls\u0000 and incorporate a candidate-pruning strategy to further accelerate the computation for handling skewed datasets. Extensive experiments are conducted on real datasets, demonstrating the superiority of our proposed pruning rules and solution compared to the state-of-the-art techniques.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"766-780"},"PeriodicalIF":8.9,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0