Experiences with Virtuoso Cluster RDF Column Store

P. Boncz, O. Erling, M. Pham
{"title":"Experiences with Virtuoso Cluster RDF Column Store","authors":"P. Boncz, O. Erling, M. Pham","doi":"10.1201/b16859-13","DOIUrl":null,"url":null,"abstract":"Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that","PeriodicalId":252334,"journal":{"name":"Linked Data Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linked Data Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b16859-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Virtuoso Column Store [185] introduces vectorized execution into the Virtuoso DBMS. Additionally, its scale-out version, that allows running the system on a cluster, has been significantly redesigned. This article discusses advances in scale-out support in Virtuoso and analyzes this on the Berlin SPARQL Benchmark (BSBM) [101]. To demonstrate the features of Virtuoso Cluster RDF Column Store, we first present micro-benchmarks on a small 2node cluster with 10 billion triples. In the full evaluation we show one can now scale-out to a BSBM database of 150 billion triples. The latter experiment is a 750 times increase over the previous largest BSBM report, and for the first time includes both its Explore and Business Intelligence workloads. The storage scheme used by Virtuoso for storing RDF Subject-PropertyObject triples pertaining to a Graph (hence we have quads, not triples) consists of five indexes: PSOG, POSG, SP, OP, GS. To be precise, PSOG is a B-tree with key (P,S,O,G), where P is a number identifying a property, S a subject, O an object and G the graph. Additionally, there is a B-tree holding URIs and a B-tree holding string literals, both of them used to encode string(-URI)s into numerical identifiers. Users may alter the indexing scheme of Virtuoso but this almost never happens. The three last indexes (SP, OP, GS) are projections of the first two covering indexes, containing only the unique combinations – hence these are much smaller. We note that Virtuoso Column Store Edition (V7) departs from the previous Virtuoso editions (V6) in that
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Virtuoso集群RDF列存储的经验
Virtuoso Column Store[185]在Virtuoso DBMS中引入了矢量化执行。此外,它的横向扩展版本(允许在集群上运行系统)也进行了重大的重新设计。本文讨论了Virtuoso在横向扩展支持方面的进展,并在Berlin SPARQL Benchmark (BSBM)上进行了分析[101]。为了演示Virtuoso Cluster RDF Column Store的特性,我们首先在一个包含100亿个三元组的小型2节点集群上进行微基准测试。在完整的评估中,我们展示了现在可以扩展到一个包含1500亿个三元组的BSBM数据库。后一个实验比之前最大的BSBM报告增加了750倍,并且首次包含了其探索和商业智能工作负载。Virtuoso用于存储属于图的RDF Subject-PropertyObject三元组(因此我们有四元组,而不是三元组)的存储方案由五个索引组成:PSOG、POSG、SP、OP、GS。准确地说,PSOG是一个键为(P,S,O,G)的b树,其中P是标识属性的数字,S是主体,O是客体,G是图。此外,还有一个保存uri的b树和一个保存字符串字面值的b树,它们都用于将字符串(-URI)编码为数字标识符。用户可以改变Virtuoso的索引方案,但这几乎从未发生过。最后三个指数(SP、OP、GS)是前两个覆盖指数的投影,只包含唯一的组合——因此它们要小得多。我们注意到Virtuoso列存储版(V7)在这方面与以前的Virtuoso版本(V6)有所不同
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Federated Query Processing over Linked Data SPARQL Query Processing in the Cloud Using read-write Linked Data for Application Integration Architecture of Linked Data Applications Index-Based Source Selection and Optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1