Pub Date : 2023-09-29DOI: 10.1007/s41019-023-00229-4
Pushp Sra, Satish Chand
Abstract High utility itemset mining is a crucial research area that focuses on identifying combinations of itemsets from databases that possess a utility value higher than a user-specified threshold. However, most existing algorithms assume that the databases are static, which is not realistic for real-life datasets that are continuously growing with new data. Furthermore, existing algorithms only rely on the utility value to identify relevant itemsets, leading to even the earliest occurring combinations being produced as output. Although some mining algorithms adopt a support-based approach to account for itemset frequency, they do not consider the temporal nature of itemsets. To address these challenges, this paper proposes the Scented Utility Miner (SUM) algorithm that uses a reinduction strategy to track the recency of itemset occurrence and mine itemsets from incremental databases. The paper provides a novel approach for mining high utility itemsets from dynamic databases and presents several experiments that demonstrate the effectiveness of the proposed approach.
{"title":"A Reinduction-Based Approach for Efficient High Utility Itemset Mining from Incremental Datasets","authors":"Pushp Sra, Satish Chand","doi":"10.1007/s41019-023-00229-4","DOIUrl":"https://doi.org/10.1007/s41019-023-00229-4","url":null,"abstract":"Abstract High utility itemset mining is a crucial research area that focuses on identifying combinations of itemsets from databases that possess a utility value higher than a user-specified threshold. However, most existing algorithms assume that the databases are static, which is not realistic for real-life datasets that are continuously growing with new data. Furthermore, existing algorithms only rely on the utility value to identify relevant itemsets, leading to even the earliest occurring combinations being produced as output. Although some mining algorithms adopt a support-based approach to account for itemset frequency, they do not consider the temporal nature of itemsets. To address these challenges, this paper proposes the Scented Utility Miner (SUM) algorithm that uses a reinduction strategy to track the recency of itemset occurrence and mine itemsets from incremental databases. The paper provides a novel approach for mining high utility itemsets from dynamic databases and presents several experiments that demonstrate the effectiveness of the proposed approach.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135244606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-20DOI: 10.1007/s41019-023-00230-x
Shanna Zhong, Jiahui Wang, Kun Yue, Liang Duan, Zhengbao Sun, Yan Fang
Abstract Knowledge graph (KG) has become the vital resource for various applications like question answering and recommendation system. However, several relations in KG only have few observed triples, which makes it necessary to develop the method for few-shot relation prediction. In this paper, we propose the C onvolutional Neural Network with Self- A ttention R elation P rediction (CARP) model to predict new facts with few observed triples. First, to learn the relation property features, we build a feature encoder by using the convolutional neural network with self-attention from the few observed triples rather than background knowledge. Then, by incorporating the learned features, we give an embedding network to learn the representation of incomplete triples. Finally, we give the loss function and training algorithm of our CARP model. Experimental results on three real-world datasets show that our proposed method improves Hits@10 by 48% on average over the state-of-the-art competitors.
{"title":"Few-Shot Relation Prediction of Knowledge Graph via Convolutional Neural Network with Self-Attention","authors":"Shanna Zhong, Jiahui Wang, Kun Yue, Liang Duan, Zhengbao Sun, Yan Fang","doi":"10.1007/s41019-023-00230-x","DOIUrl":"https://doi.org/10.1007/s41019-023-00230-x","url":null,"abstract":"Abstract Knowledge graph (KG) has become the vital resource for various applications like question answering and recommendation system. However, several relations in KG only have few observed triples, which makes it necessary to develop the method for few-shot relation prediction. In this paper, we propose the C onvolutional Neural Network with Self- A ttention R elation P rediction (CARP) model to predict new facts with few observed triples. First, to learn the relation property features, we build a feature encoder by using the convolutional neural network with self-attention from the few observed triples rather than background knowledge. Then, by incorporating the learned features, we give an embedding network to learn the representation of incomplete triples. Finally, we give the loss function and training algorithm of our CARP model. Experimental results on three real-world datasets show that our proposed method improves Hits@10 by 48% on average over the state-of-the-art competitors.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136309292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-09DOI: 10.1007/s41019-023-00218-7
Youming Ge, Zitong Chen, Yubao Liu
Abstract With the increasing of requirements from many aspects, various queries and analyses arise focusing on social network. Time is a common and necessary dimension in various types of social networks. Social networks with time information are called temporal social networks, in which time information can be the time when a user sends message to another user. Keywords search in temporal social networks consists of finding relationships between a group users that has a set of query labels and is valid within the query time interval. It provides assistance in social network analysis, classification of social network users, community detection, etc. However, the existing methods have limitations in solving temporal social network keyword search problems. We propose a basic algorithm, the discrete timestamp algorithm, with the intention of turning the problem into a traditional keyword search on social networks. We also propose an approximative algorithm based on the discrete timestamp algorithm, but it still suffers from the traditional algorithms’ low efficiency. To further improve the performance, we propose a new algorithm based on dynamic programming to solve the keyword search in temporal social network. The main idea is to extend a vertex into a solution by edge-growth operation and tree-merger operation. We also propose two powerful pruning techniques to reduce the intermediate results during the extension. Additionally, all of the algorithms we proposed are capable of handling a variety of ranking functions, and all of them can be made to conform to top-N keyword querying. The efficiency and effectiveness of the proposed algorithms are verified through extensive empirical studies.
{"title":"An Efficient Keywords Search in Temporal Social Networks","authors":"Youming Ge, Zitong Chen, Yubao Liu","doi":"10.1007/s41019-023-00218-7","DOIUrl":"https://doi.org/10.1007/s41019-023-00218-7","url":null,"abstract":"Abstract With the increasing of requirements from many aspects, various queries and analyses arise focusing on social network. Time is a common and necessary dimension in various types of social networks. Social networks with time information are called temporal social networks, in which time information can be the time when a user sends message to another user. Keywords search in temporal social networks consists of finding relationships between a group users that has a set of query labels and is valid within the query time interval. It provides assistance in social network analysis, classification of social network users, community detection, etc. However, the existing methods have limitations in solving temporal social network keyword search problems. We propose a basic algorithm, the discrete timestamp algorithm, with the intention of turning the problem into a traditional keyword search on social networks. We also propose an approximative algorithm based on the discrete timestamp algorithm, but it still suffers from the traditional algorithms’ low efficiency. To further improve the performance, we propose a new algorithm based on dynamic programming to solve the keyword search in temporal social network. The main idea is to extend a vertex into a solution by edge-growth operation and tree-merger operation. We also propose two powerful pruning techniques to reduce the intermediate results during the extension. Additionally, all of the algorithms we proposed are capable of handling a variety of ranking functions, and all of them can be made to conform to top-N keyword querying. The efficiency and effectiveness of the proposed algorithms are verified through extensive empirical studies.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136193060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n times n$$ n×n , it needs $$O(n^3)$$ O(n3) time and $$O(n^2)$$ O(n2) space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l ll n$$ l≪n , we can efficiently obtain the representations of clusters from a small $$l times l$$ l×l similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.
网络表示学习实际上是图分析的工具。以往的主流方法是对节点间的接近矩阵进行因式分解。但如果n为节点数,由于邻近矩阵的大小为$$n times n$$ n × n,进行网络表示学习需要$$O(n^3)$$ O (n 3)时间和$$O(n^2)$$ O (n 2)空间;对于大规模图形来说,它们非常高。本文介绍了利用聚类之间的相似度代替节点之间的接近度的新思想;该方法根据聚类之间的相似性计算聚类的表示,并通过引用节点来计算节点的表示。如果l是聚类的数目,由于$$l ll n$$ l≪n,我们可以从一个小的$$l times l$$ l × l相似矩阵中有效地得到聚类的表示。此外,由于每个集群中的节点具有相似的结构属性,我们可以有效地计算节点的表示向量。实验表明,我们的方法可以比现有的方法更有效地进行网络表示学习。
{"title":"Efficient Network Representation Learning via Cluster Similarity","authors":"Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura, Naonori Ueda","doi":"10.1007/s41019-023-00222-x","DOIUrl":"https://doi.org/10.1007/s41019-023-00222-x","url":null,"abstract":"Abstract Network representation learning is a de facto tool for graph analytics. The mainstream of the previous approaches is to factorize the proximity matrix between nodes. However, if n is the number of nodes, since the size of the proximity matrix is $$n times n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>n</mml:mi> <mml:mo>×</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , it needs $$O(n^3)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>3</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> time and $$O(n^2)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> space to perform network representation learning; they are significantly high for large-scale graphs. This paper introduces the novel idea of using similarities between clusters instead of proximities between nodes; the proposed approach computes the representations of the clusters from similarities between clusters and computes the representations of nodes by referring to them. If l is the number of clusters, since $$l ll n$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>≪</mml:mo> <mml:mi>n</mml:mi> </mml:mrow> </mml:math> , we can efficiently obtain the representations of clusters from a small $$l times l$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>l</mml:mi> <mml:mo>×</mml:mo> <mml:mi>l</mml:mi> </mml:mrow> </mml:math> similarity matrix. Furthermore, since nodes in each cluster share similar structural properties, we can effectively compute the representation vectors of nodes. Experiments show that our approach can perform network representation learning more efficiently and effectively than existing approaches.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135298458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s41019-023-00227-6
Zi Chen, Bo Feng, Long Yuan, Xuemin Lin, Liping Wang
{"title":"Fully Dynamic Contraction Hierarchies with Label Restrictions on Road Networks","authors":"Zi Chen, Bo Feng, Long Yuan, Xuemin Lin, Liping Wang","doi":"10.1007/s41019-023-00227-6","DOIUrl":"https://doi.org/10.1007/s41019-023-00227-6","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79491322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}