首页 > 最新文献

2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)最新文献

英文 中文
Reachability in Large Graphs Using Bloom Filters 使用布隆过滤器的大型图的可达性
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.000-9
Arkaprava Saha, Neha Sengupta, Maya Ramanath
Reachability queries are a fundamental graph operation with applications in several domains. There has been extensive research over several decades on answering reachability queries efficiently using sophisticated index structures. However, most of these methods are built for static graphs. For graphs that are updated very frequently and are massive in size, maintaining such index structures is often infeasible due to a large memory footprint and extremely slow updates. In this paper, we introduce a technique to compute reachability queries for very large and highly dynamic graphs that minimizes the memory footprint and update time. In particular, we enable a previously proposed, index-free, approximate method for reachability called ARROW on a compact graph representation called Bloom graphs. Bloom graphs use collections of the well known summary data structure called the Bloom filter to store the edges of the graph. In our experimental evaluation with real world graph datasets with up to millions of nodes and edges, we show that using ARROW with a Bloom graph achieves memory savings of up to 50%, while having accuracy close to 100% for all graphs.
可达性查询是几个领域中应用程序的基本图操作。几十年来,人们对如何使用复杂的索引结构有效地回答可达性查询进行了广泛的研究。然而,大多数方法都是为静态图构建的。对于更新非常频繁且规模巨大的图,由于占用大量内存和更新极其缓慢,维护这样的索引结构通常是不可行的。在本文中,我们介绍了一种技术,可以为非常大的、高度动态的图计算可达性查询,从而最大限度地减少内存占用和更新时间。特别地,我们在称为Bloom图的紧凑图表示上启用了先前提出的无索引的可达性近似方法ARROW。布隆图使用众所周知的汇总数据结构(称为布隆过滤器)的集合来存储图的边。在我们对具有多达数百万个节点和边的真实世界图数据集的实验评估中,我们表明使用ARROW和Bloom图可以节省高达50%的内存,同时对所有图的准确率接近100%。
{"title":"Reachability in Large Graphs Using Bloom Filters","authors":"Arkaprava Saha, Neha Sengupta, Maya Ramanath","doi":"10.1109/ICDEW.2019.000-9","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-9","url":null,"abstract":"Reachability queries are a fundamental graph operation with applications in several domains. There has been extensive research over several decades on answering reachability queries efficiently using sophisticated index structures. However, most of these methods are built for static graphs. For graphs that are updated very frequently and are massive in size, maintaining such index structures is often infeasible due to a large memory footprint and extremely slow updates. In this paper, we introduce a technique to compute reachability queries for very large and highly dynamic graphs that minimizes the memory footprint and update time. In particular, we enable a previously proposed, index-free, approximate method for reachability called ARROW on a compact graph representation called Bloom graphs. Bloom graphs use collections of the well known summary data structure called the Bloom filter to store the edges of the graph. In our experimental evaluation with real world graph datasets with up to millions of nodes and edges, we show that using ARROW with a Bloom graph achieves memory savings of up to 50%, while having accuracy close to 100% for all graphs.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115637158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Food Image to Cooking Instructions Conversion Through Compressed Embeddings Using Deep Learning 通过使用深度学习的压缩嵌入将食物图像转换为烹饪说明
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00-31
Madhu Kumari, Tajinder Singh
The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.
将食物图像转换为其烹饪描述/说明是上述图像理解挑战的一个合适实例。本文提出了一种独特的方法,利用CNN、LSTM和双向LSTM的交叉模型训练,获得食谱图像烹饪指令的压缩嵌入。这方面的主要挑战是指令的可变长度,每个食谱的指令数量以及食物图像中出现的多种食物。我们的模型通过迁移学习和跨不同神经网络的多级误差传播,成功地解决了这些挑战,实现了与原始指令高度相似的烹饪指令的浓缩嵌入。在本文中,我们特别实验了从网络上抓取的印度烹饪数据(食物图像,配料,烹饪说明和上下文信息)。该模型可用于信息检索系统,也可用于菜谱自动推荐。
{"title":"Food Image to Cooking Instructions Conversion Through Compressed Embeddings Using Deep Learning","authors":"Madhu Kumari, Tajinder Singh","doi":"10.1109/ICDEW.2019.00-31","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-31","url":null,"abstract":"The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Triangle Counting on GPU Using Fine-Grained Task Distribution 使用细粒度任务分配的GPU三角形计数
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.000-8
Lin Hu, Naiqing Guan, Lei Zou
Due to the irregularity of graph data, designing an efficient GPU-based graph algorithm is always a challenging task. Inefficient memory access and work imbalance often limit GPU-based graph computing, even though GPU provides a massively parallelism computing fashion. To address that, in this paper, we propose a fine-grained task distribution strategy for triangle counting task. Extensive experiments and theoretical analysis confirm the superiority of our algorithm over both large real and synthetic graph datasets.
由于图形数据的不规则性,设计一种高效的基于gpu的图形算法一直是一项具有挑战性的任务。低效的内存访问和工作不平衡经常限制基于GPU的图形计算,即使GPU提供了大规模并行计算方式。为了解决这个问题,本文提出了一种三角计数任务的细粒度任务分配策略。大量的实验和理论分析证实了我们的算法在大型真实和合成图数据集上的优越性。
{"title":"Triangle Counting on GPU Using Fine-Grained Task Distribution","authors":"Lin Hu, Naiqing Guan, Lei Zou","doi":"10.1109/ICDEW.2019.000-8","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-8","url":null,"abstract":"Due to the irregularity of graph data, designing an efficient GPU-based graph algorithm is always a challenging task. Inefficient memory access and work imbalance often limit GPU-based graph computing, even though GPU provides a massively parallelism computing fashion. To address that, in this paper, we propose a fine-grained task distribution strategy for triangle counting task. Extensive experiments and theoretical analysis confirm the superiority of our algorithm over both large real and synthetic graph datasets.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"64 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data Mining Approach to Chinese Food Analysis for Diet-Related Cardiometabolic Diseases 中国饮食相关心血管代谢疾病的数据挖掘方法
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00-29
Angela Chang, Jieyi Hu, Yichao Liu, M. Liu
Data mining is the discovery of valuable and novel structures in datasets. Considering the number of people suffering from diet-related cardiometabolic increase, it brings into question the media's efforts in food and health communication. One of the main objectives in this study focuses on the emerging data mining methodology to understand the structure of what and how the news media discuss food, diet, and the related cardiometabolic diseases. A total of 6,625 items of coverage on food, flavor, and condiments along with cardiometabolic diseases is identified. Data mining algorithms concern food for predicting health outcomes and providing policy information. The most typical usage of a food data corpus is automatic conversion from text to health afflictions on larger cultural forces.
数据挖掘是在数据集中发现有价值的和新颖的结构。考虑到与饮食有关的心脏代谢增加的人数,这使媒体在食品和健康宣传方面的努力受到质疑。本研究的主要目标之一是关注新兴的数据挖掘方法,以了解新闻媒体讨论食物、饮食和相关心脏代谢疾病的结构和方式。总共有6625个项目涉及食品、香料、调味品以及心脏代谢疾病。数据挖掘算法涉及预测健康结果和提供政策信息的食品。食品数据语料库最典型的用法是从文本自动转换到更大的文化力量上的健康问题。
{"title":"Data Mining Approach to Chinese Food Analysis for Diet-Related Cardiometabolic Diseases","authors":"Angela Chang, Jieyi Hu, Yichao Liu, M. Liu","doi":"10.1109/ICDEW.2019.00-29","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-29","url":null,"abstract":"Data mining is the discovery of valuable and novel structures in datasets. Considering the number of people suffering from diet-related cardiometabolic increase, it brings into question the media's efforts in food and health communication. One of the main objectives in this study focuses on the emerging data mining methodology to understand the structure of what and how the news media discuss food, diet, and the related cardiometabolic diseases. A total of 6,625 items of coverage on food, flavor, and condiments along with cardiometabolic diseases is identified. Data mining algorithms concern food for predicting health outcomes and providing policy information. The most typical usage of a food data corpus is automatic conversion from text to health afflictions on larger cultural forces.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Collaborative Generative Adversarial Network for Recommendation Systems 推荐系统的协同生成对抗网络
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00-16
Yuzhen Tong, Yadan Luo, Zheng Zhang, S. Sadiq, Peng Cui
Recommendation systems have been a core part of daily Internet life. Conventional recommendation models hardly defend adversaries due to the natural noise like misclicking. Recent researches on GAN-based recommendation systems can improve the robustness of the learning models, yielding the state-of-the-art performance. The basic idea is to adopt an interplay minimax game on two recommendation systems by picking negative samples as fake items and employ reinforcement learning policy. However, such strategy may lead to mode collapse and result in high vulnerability to adversarial perturbations on its model parameters. In this paper, we propose a new collaborative framework, namely Collaborative Generative Adversarial Network (CGAN), which adopts Variational Auto-encoder (VAE) as the generator and performs adversarial training in the continuous embedding space. The formulation of CGAN has two advantages: 1) its auto-encoder takes the role of generator to mimic the true distribution of users preferences over items by capturing subtle latent factors underlying user-item interactions; 2) the adversarial training in continuous space enhances models robustness and performance. Extensive experiments conducted on two real-world benchmark recommendation datasets demonstrate the superior performance of our CGAN in comparison with the state-of-the-art GAN-based methods.
推荐系统已经成为日常互联网生活的核心部分。由于错误点击等自然噪声,传统的推荐模型很难防御对手。最近对基于gan的推荐系统的研究可以提高学习模型的鲁棒性,从而获得最先进的性能。其基本思想是在两个推荐系统上采用相互作用的极大极小博弈,选取负样本作为假项目,并采用强化学习策略。然而,这种策略可能导致模态崩溃,并导致其模型参数极易受到对抗性摄动的影响。本文提出了一种新的协作框架——协同生成对抗网络(CGAN),该框架采用变分自编码器(VAE)作为生成器,在连续嵌入空间中进行对抗训练。CGAN的表述有两个优点:1)它的自编码器扮演生成器的角色,通过捕捉用户-物品交互背后的微妙潜在因素来模拟用户对物品的真实偏好分布;2)连续空间的对抗训练增强了模型的鲁棒性和性能。在两个真实世界的基准推荐数据集上进行的大量实验表明,与最先进的基于gan的方法相比,我们的CGAN具有优越的性能。
{"title":"Collaborative Generative Adversarial Network for Recommendation Systems","authors":"Yuzhen Tong, Yadan Luo, Zheng Zhang, S. Sadiq, Peng Cui","doi":"10.1109/ICDEW.2019.00-16","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-16","url":null,"abstract":"Recommendation systems have been a core part of daily Internet life. Conventional recommendation models hardly defend adversaries due to the natural noise like misclicking. Recent researches on GAN-based recommendation systems can improve the robustness of the learning models, yielding the state-of-the-art performance. The basic idea is to adopt an interplay minimax game on two recommendation systems by picking negative samples as fake items and employ reinforcement learning policy. However, such strategy may lead to mode collapse and result in high vulnerability to adversarial perturbations on its model parameters. In this paper, we propose a new collaborative framework, namely Collaborative Generative Adversarial Network (CGAN), which adopts Variational Auto-encoder (VAE) as the generator and performs adversarial training in the continuous embedding space. The formulation of CGAN has two advantages: 1) its auto-encoder takes the role of generator to mimic the true distribution of users preferences over items by capturing subtle latent factors underlying user-item interactions; 2) the adversarial training in continuous space enhances models robustness and performance. Extensive experiments conducted on two real-world benchmark recommendation datasets demonstrate the superior performance of our CGAN in comparison with the state-of-the-art GAN-based methods.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131853472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Dynamic Data Quality for Static Blockchains 静态区块链的动态数据质量
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00-41
Alan G. Labouseur, C. Matheus
Blockchain's popularity has changed the way people think about data access, storage, and retrieval. Because of this, many classic data management challenges are imbued with renewed significance. One such challenge is the issue of Dynamic Data Quality. As time passes, data changes in content and structure and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of the changing content and changing structure of data itself. But blockchain is a static structure. The friction between static blockchains and Dynamic Data Quality give rise to new research opportunities, which the authors address in this paper.
b区块链的流行改变了人们对数据访问、存储和检索的看法。正因为如此,许多经典的数据管理挑战被赋予了新的意义。其中一个挑战是动态数据质量问题。随着时间的推移,数据的内容和结构会发生变化,从而变得动态。因此,数据质量也变得动态,因为它是数据本身不断变化的内容和结构的集合特征。但是区块链是一个静态结构。静态区块链和动态数据质量之间的摩擦带来了新的研究机会,作者在本文中提到了这一点。
{"title":"Dynamic Data Quality for Static Blockchains","authors":"Alan G. Labouseur, C. Matheus","doi":"10.1109/ICDEW.2019.00-41","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-41","url":null,"abstract":"Blockchain's popularity has changed the way people think about data access, storage, and retrieval. Because of this, many classic data management challenges are imbued with renewed significance. One such challenge is the issue of Dynamic Data Quality. As time passes, data changes in content and structure and thus becomes dynamic. Data quality, therefore, also becomes dynamic because it is an aggregate characteristic of the changing content and changing structure of data itself. But blockchain is a static structure. The friction between static blockchains and Dynamic Data Quality give rise to new research opportunities, which the authors address in this paper.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129493239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distilling Knowledge from User Information for Document Level Sentiment Classification 从用户信息中提取知识用于文档级情感分类
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00-15
Jialing Song
Combining global user and product characteristics with local review information provides a powerful mechanism for predicting users' sentiment in a review document about a product on online review sites such as Amazon, Yelp and IMDB. However, the user information is not always available in the real scenario, for example, some new-registered users, or some sites allowing users' comments without logging in. To address this issue, we introduce a novel knowledge distillation (KD) learning paradigm, to transfer the user characteristics into the weights of student neural networks that just utilize product and review information. The teacher model transfers its predictive distributions of training data to the student model. Thus, the user profiles are only required during the training stage. Experimental results on several sentiment classification datasets show that the proposed learning framework enables student models to achieve improved performance.
将全球用户和产品特征与本地评论信息相结合,提供了一种强大的机制,可以预测用户在在线评论网站(如Amazon、Yelp和IMDB)上对产品的评论文档中的情绪。但是,在实际场景中,用户信息并不总是可用的,例如,一些新注册的用户,或者一些不需要登录就允许用户评论的站点。为了解决这个问题,我们引入了一种新的知识蒸馏(KD)学习范式,将用户特征转换为仅利用产品和评论信息的学生神经网络的权重。教师模型将其训练数据的预测分布传递给学生模型。因此,只有在培训阶段才需要用户配置文件。在多个情感分类数据集上的实验结果表明,所提出的学习框架使学生模型获得了更好的性能。
{"title":"Distilling Knowledge from User Information for Document Level Sentiment Classification","authors":"Jialing Song","doi":"10.1109/ICDEW.2019.00-15","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-15","url":null,"abstract":"Combining global user and product characteristics with local review information provides a powerful mechanism for predicting users' sentiment in a review document about a product on online review sites such as Amazon, Yelp and IMDB. However, the user information is not always available in the real scenario, for example, some new-registered users, or some sites allowing users' comments without logging in. To address this issue, we introduce a novel knowledge distillation (KD) learning paradigm, to transfer the user characteristics into the weights of student neural networks that just utilize product and review information. The teacher model transfers its predictive distributions of training data to the student model. Thus, the user profiles are only required during the training stage. Experimental results on several sentiment classification datasets show that the proposed learning framework enables student models to achieve improved performance.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient Parallel Computing of Graph Edit Distance 图形编辑距离的高效并行计算
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.000-7
Ran Wang, Yixiang Fang, Xing Feng
With the prevalence of graph data, graph edit distance (GED), a well-known measure of similarity between two graphs, has been widely used in many real applications, such as graph classification and clustering, similar object detection, and biology network analysis. Despite its usefulness and popularity, GED is computationally costly, because it is NP-hard. Currently, most existing solutions focus on computing GED in a serial manner and little attention has been paid for parallel computing. In this paper, we propose a novel efficient parallel algorithm for computing GED. Our algorithm is based on the state[1]of-the-art GED algorithm AStar+-LSa, and is called PGED. The main idea of PGED is to allocate the heavy workload of searching the optimal vertex mapping between two graphs, which is the most time consuming step, to multiple threads based on an effective allocation strategy, resulting in high efficiency of GED computation. We have evaluated PGED on two real datasets, and the experimental results show that by using multiple threads, PGED is more efficient than AStar+-LSa. In addition, by carefully tuning the parameters, the performance of PGED can be further improved.
随着图数据的普及,图编辑距离(GED)作为两个图之间的相似度度量,已被广泛应用于许多实际应用,如图分类和聚类、相似目标检测和生物网络分析。尽管它很有用而且很受欢迎,但是由于它是NP-hard的,因此在计算上是昂贵的。目前,大多数解决方案都集中在以串行方式计算GED,而很少关注并行计算。在本文中,我们提出了一种新的高效并行算法来计算GED。我们的算法基于最先进的状态GED算法AStar+-LSa,称为PGED。PGED的主要思想是基于有效的分配策略,将搜索两个图之间最优顶点映射的繁重工作分配给多个线程,从而提高了GED的计算效率。我们在两个真实的数据集上对PGED进行了评估,实验结果表明,在使用多线程的情况下,PGED比AStar+-LSa更高效。此外,通过对参数的精心调整,可以进一步提高PGED的性能。
{"title":"Efficient Parallel Computing of Graph Edit Distance","authors":"Ran Wang, Yixiang Fang, Xing Feng","doi":"10.1109/ICDEW.2019.000-7","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-7","url":null,"abstract":"With the prevalence of graph data, graph edit distance (GED), a well-known measure of similarity between two graphs, has been widely used in many real applications, such as graph classification and clustering, similar object detection, and biology network analysis. Despite its usefulness and popularity, GED is computationally costly, because it is NP-hard. Currently, most existing solutions focus on computing GED in a serial manner and little attention has been paid for parallel computing. In this paper, we propose a novel efficient parallel algorithm for computing GED. Our algorithm is based on the state[1]of-the-art GED algorithm AStar+-LSa, and is called PGED. The main idea of PGED is to allocate the heavy workload of searching the optimal vertex mapping between two graphs, which is the most time consuming step, to multiple threads based on an effective allocation strategy, resulting in high efficiency of GED computation. We have evaluated PGED on two real datasets, and the experimental results show that by using multiple threads, PGED is more efficient than AStar+-LSa. In addition, by carefully tuning the parameters, the performance of PGED can be further improved.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Semantic Similarity Computation in Knowledge Graphs: Comparisons and Improvements 知识图中的语义相似度计算:比较与改进
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.000-5
Chaoqun Yang, Yuanyuan Zhu, Ming Zhong, Rongrong Li
Computing semantic similarity between concepts is a fundamental task in natural language processing and has a large variety of applications. In this paper, first of all, we will review and analyze existing semantic similarity computation methods in knowledge graphs. Through the analysis of these methods, we find that existing works mainly focus on the context features of concepts which indicate the position or the frequency of the concepts in the knowledge graphs, such as the depth of terms, information content of the terms, or the distance between terms, while a fundamental part to describe the meaning of the concept, the synsets of concepts, are neglected for a long term. Thus, in this paper, we propose a new method to compute the similarity of concepts based on their extended synsets. Moreover, we propose a general hybrid framework, which can combine our new similarity measure based on extended synsets with any of existing context feature based semantic similarities to evaluate the concepts more accurately. We conducted experiments on five well-known datasets for semantic similarity evaluation, and the experimental results show that our general framework can improve most of existing methods significantly.
概念之间的语义相似度计算是自然语言处理中的一项基本任务,具有广泛的应用。本文首先对现有的知识图语义相似度计算方法进行了回顾和分析。通过对这些方法的分析,我们发现现有的工作主要集中在概念的上下文特征上,这些特征表明了概念在知识图中的位置或频率,如术语的深度、术语的信息含量或术语之间的距离,而描述概念含义的基本部分概念的同义词集长期被忽视。因此,本文提出了一种基于扩展同义词集计算概念相似度的新方法。此外,我们提出了一个通用的混合框架,该框架可以将我们基于扩展同义词集的新相似度度量与任何现有的基于上下文特征的语义相似度相结合,以更准确地评估概念。我们在5个已知的数据集上进行了语义相似度评估的实验,实验结果表明,我们的通用框架可以显著改进大多数现有的方法。
{"title":"Semantic Similarity Computation in Knowledge Graphs: Comparisons and Improvements","authors":"Chaoqun Yang, Yuanyuan Zhu, Ming Zhong, Rongrong Li","doi":"10.1109/ICDEW.2019.000-5","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.000-5","url":null,"abstract":"Computing semantic similarity between concepts is a fundamental task in natural language processing and has a large variety of applications. In this paper, first of all, we will review and analyze existing semantic similarity computation methods in knowledge graphs. Through the analysis of these methods, we find that existing works mainly focus on the context features of concepts which indicate the position or the frequency of the concepts in the knowledge graphs, such as the depth of terms, information content of the terms, or the distance between terms, while a fundamental part to describe the meaning of the concept, the synsets of concepts, are neglected for a long term. Thus, in this paper, we propose a new method to compute the similarity of concepts based on their extended synsets. Moreover, we propose a general hybrid framework, which can combine our new similarity measure based on extended synsets with any of existing context feature based semantic similarities to evaluate the concepts more accurately. We conducted experiments on five well-known datasets for semantic similarity evaluation, and the experimental results show that our general framework can improve most of existing methods significantly.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130660679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Sparse Manifold Embedded Hashing for Multimedia Retrieval 多媒体检索的稀疏流形嵌入哈希
Pub Date : 2019-04-08 DOI: 10.1109/ICDEW.2019.00011
Yongxin Wang, Xin Luo, Huaxiang Zhang, Xin-Shun Xu
Hashing has become more and more attractive in the large-scale multimedia retrieval community, due to its fast search speed and low storage cost. Most hashing methods focus on finding the inherent data structure, and neglect the sparse reconstruction relationship. Besides, most of them adopt a two-step solution for the structure embedding and the hash codes learning, which may yield suboptimal results. To address these issues, in this paper, we present a novel sparsity-based hashing method, namely, Sparse Manifold embedded hASHing, SMASH for short. It employs the sparse representation technique to extract the implicit structure in the data, and preserves the structure by minimizing the reconstruction error and the quantization loss with constraints to satisfy the independence and balance of the hash codes. An alternative algorithm is devised to solve the optimization problem in SMASH. Based on it, SMASH learns the hash codes and the hash functions simultaneously. Extensive experiments on several benchmark datasets demonstrate that SMASH outperforms some state-of-the-art hashing methods for the multimedia retrieval task.
哈希算法以其快速的搜索速度和低廉的存储成本,在大型多媒体检索社区中越来越受到人们的青睐。大多数哈希方法都侧重于寻找固有的数据结构,而忽略了稀疏重建关系。此外,对于结构嵌入和哈希码学习,它们大多采用两步解决方案,可能会产生次优结果。为了解决这些问题,在本文中,我们提出了一种新的基于稀疏性的哈希方法,即稀疏流形嵌入哈希,简称SMASH。它采用稀疏表示技术提取数据中的隐式结构,并通过最小化重构误差和量化损失来保留结构,同时约束哈希码的独立性和平衡性。设计了一种替代算法来解决SMASH中的优化问题。在此基础上,SMASH同时学习哈希码和哈希函数。在几个基准数据集上进行的大量实验表明,SMASH在多媒体检索任务中优于一些最先进的散列方法。
{"title":"Sparse Manifold Embedded Hashing for Multimedia Retrieval","authors":"Yongxin Wang, Xin Luo, Huaxiang Zhang, Xin-Shun Xu","doi":"10.1109/ICDEW.2019.00011","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00011","url":null,"abstract":"Hashing has become more and more attractive in the large-scale multimedia retrieval community, due to its fast search speed and low storage cost. Most hashing methods focus on finding the inherent data structure, and neglect the sparse reconstruction relationship. Besides, most of them adopt a two-step solution for the structure embedding and the hash codes learning, which may yield suboptimal results. To address these issues, in this paper, we present a novel sparsity-based hashing method, namely, Sparse Manifold embedded hASHing, SMASH for short. It employs the sparse representation technique to extract the implicit structure in the data, and preserves the structure by minimizing the reconstruction error and the quantization loss with constraints to satisfy the independence and balance of the hash codes. An alternative algorithm is devised to solve the optimization problem in SMASH. Based on it, SMASH learns the hash codes and the hash functions simultaneously. Extensive experiments on several benchmark datasets demonstrate that SMASH outperforms some state-of-the-art hashing methods for the multimedia retrieval task.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132420358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1