GoDB: From Batch Processing to Distributed Querying over Property Graphs

N. Jamadagni, Yogesh L. Simmhan
{"title":"GoDB: From Batch Processing to Distributed Querying over Property Graphs","authors":"N. Jamadagni, Yogesh L. Simmhan","doi":"10.1109/CCGrid.2016.105","DOIUrl":null,"url":null,"abstract":"Property Graphs with rich attributes over vertices and edges are becoming common. Querying and mining such linked Big Data is important for knowledge discovery and mining. Distributed graph platforms like Pregel focus on batch execution on commodity clusters. But exploratory analytics requires platforms that are both responsive and scalable. We propose Graph-oriented Database (GoDB), a distributed graph database that supports declarative queries over large property graphs. GoDB builds upon our GoFFish subgraph-centric batch processing platform, leveraging its scalability while using execution heuristics to offer responsiveness. The GoDB declarative query model supports vertex, edge, path and reachability queries, and this is translated to a distributed execution plan on GoFFish. We also propose a novel cost model to choose a query plan that minimizes the execution latency. We evaluate GoDB deployed on the Azure IaaS Cloud, over real-world property graphs and for a diverse workload of 500 queries. These show that the cost model selects the optimal execution plan at least 80% of the time, and helps GoDB weakly scale with the graph size. A comparative study with Titan, a leading open-source graph database, shows that we complete all queries, each in ≤ 1.6 secs, while Titan cannot complete up to 42% of some query workloads.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Property Graphs with rich attributes over vertices and edges are becoming common. Querying and mining such linked Big Data is important for knowledge discovery and mining. Distributed graph platforms like Pregel focus on batch execution on commodity clusters. But exploratory analytics requires platforms that are both responsive and scalable. We propose Graph-oriented Database (GoDB), a distributed graph database that supports declarative queries over large property graphs. GoDB builds upon our GoFFish subgraph-centric batch processing platform, leveraging its scalability while using execution heuristics to offer responsiveness. The GoDB declarative query model supports vertex, edge, path and reachability queries, and this is translated to a distributed execution plan on GoFFish. We also propose a novel cost model to choose a query plan that minimizes the execution latency. We evaluate GoDB deployed on the Azure IaaS Cloud, over real-world property graphs and for a diverse workload of 500 queries. These show that the cost model selects the optimal execution plan at least 80% of the time, and helps GoDB weakly scale with the graph size. A comparative study with Titan, a leading open-source graph database, shows that we complete all queries, each in ≤ 1.6 secs, while Titan cannot complete up to 42% of some query workloads.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从批处理到属性图的分布式查询
在顶点和边上具有丰富属性的属性图正变得越来越普遍。这种关联大数据的查询和挖掘对于知识发现和挖掘具有重要意义。像Pregel这样的分布式图形平台专注于批量执行商品集群。但是探索性分析需要响应性和可扩展性都好的平台。我们提出了面向图的数据库(GoDB),这是一种分布式图数据库,支持对大型属性图的声明性查询。GoDB构建在以GoFFish子图为中心的批处理平台之上,利用其可伸缩性,同时使用执行启发式提供响应性。GoDB声明式查询模型支持顶点、边、路径和可达性查询,这在GoFFish上被转换为分布式执行计划。我们还提出了一种新的成本模型来选择执行延迟最小的查询计划。我们评估了部署在Azure IaaS云上的GoDB,在真实世界的属性图和500个查询的不同工作负载上。这些结果表明,成本模型至少在80%的时间内选择了最优的执行计划,并帮助GoDB随着图的大小进行弱扩展。与Titan(一个领先的开源图形数据库)的比较研究表明,我们完成了所有查询,每个查询在≤1.6秒内完成,而Titan无法完成高达42%的某些查询工作负载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era DTStorage: Dynamic Tape-Based Storage for Cost-Effective and Highly-Available Streaming Service Facilitating the Execution of HPC Workloads in Colombia through the Integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1