{"title":"EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud","authors":"Xiaofei Zhang, Lei Chen, Yongxin Tong, Min Wang","doi":"10.1109/ICDE.2013.6544856","DOIUrl":null,"url":null,"abstract":"To benefit from the Cloud platform's unlimited resources, managing and evaluating huge volume of RDF data in a scalable manner has attracted intensive research efforts recently. Progresses have been made on evaluating SPARQL queries with either high-level declarative programming languages, like Pig [1], or a sequence of sophisticated designed MapReduce jobs, both of which tend to answer the query with multiple join operations. However, due to the simplicity of Cloud storage and the coarse organization of RDF data in existing solutions, multiple join operations easily bring significant I/O and network traffic which can severely degrade the system performance. In this work, we first propose EAGRE, an Entity-Aware Graph compREssion technique to form a new representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with specified solution sequence modifiers, e.g., PROJECTION, ORDER BY, etc. We implement a prototype system and conduct extensive experiments over both real and synthetic datasets on an in-house cluster. The experimental results show that our solution can achieve over an order of magnitude of time saving for the SPARQL query evaluation compared to the state-of-art MapReduce-based solutions.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2013.6544856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 90
Abstract
To benefit from the Cloud platform's unlimited resources, managing and evaluating huge volume of RDF data in a scalable manner has attracted intensive research efforts recently. Progresses have been made on evaluating SPARQL queries with either high-level declarative programming languages, like Pig [1], or a sequence of sophisticated designed MapReduce jobs, both of which tend to answer the query with multiple join operations. However, due to the simplicity of Cloud storage and the coarse organization of RDF data in existing solutions, multiple join operations easily bring significant I/O and network traffic which can severely degrade the system performance. In this work, we first propose EAGRE, an Entity-Aware Graph compREssion technique to form a new representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with specified solution sequence modifiers, e.g., PROJECTION, ORDER BY, etc. We implement a prototype system and conduct extensive experiments over both real and synthetic datasets on an in-house cluster. The experimental results show that our solution can achieve over an order of magnitude of time saving for the SPARQL query evaluation compared to the state-of-art MapReduce-based solutions.
为了从云平台的无限资源中获益,以可伸缩的方式管理和评估大量RDF数据最近吸引了大量的研究工作。在使用高级声明性编程语言(如Pig[1])或一系列设计复杂的MapReduce作业来评估SPARQL查询方面已经取得了进展,这两种语言都倾向于使用多个连接操作来回答查询。然而,由于云存储的简单性和现有解决方案中RDF数据的粗糙组织,多次连接操作很容易带来大量的I/O和网络流量,从而严重降低系统性能。在这项工作中,我们首先提出了EAGRE,一种实体感知图压缩技术,用于在云平台上形成RDF数据的新表示,在此基础上,我们提出了一种高效的I/O策略,以尽可能快地评估SPARQL查询,特别是具有指定解决方案序列修饰符的查询,例如,PROJECTION, ORDER BY等。我们实现了一个原型系统,并在内部集群上对真实和合成数据集进行了广泛的实验。实验结果表明,与最先进的基于mapreduce的解决方案相比,我们的解决方案可以为SPARQL查询评估节省超过一个数量级的时间。