CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra

Chen Feng, Yongqiang Zou, Zhiwei Xu
{"title":"CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra","authors":"Chen Feng, Yongqiang Zou, Zhiwei Xu","doi":"10.1109/SKG.2011.28","DOIUrl":null,"url":null,"abstract":"Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the question: Can CCIndex benefit multi-dimensional range queries in DOTs like Cassandra? This paper studies the feasibility of employing CCIndex in Cassandra, proposes a new approach to estimate result size, implements CCIndex in Cassandra including recovery mechanisms and studies the pros and cons of CCIndex for different DOTs. Experimental results show that CCIndex gains 2.4 to 3.7 times efficiency over Cassandra's index scheme with 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain better performance for DOTs which perform scan tasks much faster than random read. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables in performing read and range queries.","PeriodicalId":184788,"journal":{"name":"2011 Seventh International Conference on Semantics, Knowledge and Grids","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Seventh International Conference on Semantics, Knowledge and Grids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2011.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the question: Can CCIndex benefit multi-dimensional range queries in DOTs like Cassandra? This paper studies the feasibility of employing CCIndex in Cassandra, proposes a new approach to estimate result size, implements CCIndex in Cassandra including recovery mechanisms and studies the pros and cons of CCIndex for different DOTs. Experimental results show that CCIndex gains 2.4 to 3.7 times efficiency over Cassandra's index scheme with 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain better performance for DOTs which perform scan tasks much faster than random read. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables in performing read and range queries.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CCIndex for Cassandra:一种新的Cassandra多维范围查询方案
多维范围查询是使用分布式有序表的大型Internet应用程序的基本需求。当Apache Cassandra使用保序哈希作为数据分区时,它是一个分布式有序表。Cassandra支持多维范围查询,但性能较差,并且有一个限制,即必须有一个具有相等操作符的维度。基于CCIndex方案在Apache HBase上的成功,本文试图回答这样一个问题:CCIndex是否能使像Cassandra这样的DOTs中的多维范围查询受益?本文研究了在Cassandra中使用CCIndex的可行性,提出了一种估算结果大小的新方法,在Cassandra中实现了包括恢复机制在内的CCIndex,并研究了CCIndex在不同DOTs中的优缺点。实验结果表明,CCIndex对200万条记录的选择性为1% ~ 50%,效率是Cassandra索引方案的2.4 ~ 3.7倍。本文表明,CCIndex是一种通用的DOTs方法,对于执行扫描任务的DOTs可以获得比随机读取更快的性能。本文揭示了Cassandra在执行读取和范围查询时针对哈希表而不是有序表进行了优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Textual Semantic Lens Domain Ontology Usage Analysis Framework Cyclic Workflow Execution Mechanism on Top of MapReduce Framework Towards an IDM Approach of Transforming Web Services into ACME Providing Quality of Service ATL Transformation for the Generation of SCA Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1