准备扫描:从HBase中高效地检索结构化数据

Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira
{"title":"准备扫描:从HBase中高效地检索结构化数据","authors":"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira","doi":"10.1145/3019612.3019863","DOIUrl":null,"url":null,"abstract":"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.","PeriodicalId":20728,"journal":{"name":"Proceedings of the Symposium on Applied Computing","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Prepared scan: efficient retrieval of structured data from HBase\",\"authors\":\"Francisco Neves, R. Vilaça, J. Pereira, R. Oliveira\",\"doi\":\"10.1145/3019612.3019863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.\",\"PeriodicalId\":20728,\"journal\":{\"name\":\"Proceedings of the Symposium on Applied Computing\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Symposium on Applied Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3019612.3019863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Symposium on Applied Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3019612.3019863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

NoSQL系统比传统关系数据库具有更好的可伸缩性,这促使大量应用程序将其数据迁移到NoSQL系统,即使它们并不打算利用所提供的模式灵活性。然而,由于这种灵活性,访问结构化数据的成本很高,导致大量带宽和处理单元的使用。在本文中,我们分析了Apache HBase中的这个开销,并提出了一种新的扫描操作,称为Prepared scan,它利用一个众所周知的应用程序模式来优化对以规则方式结构化的数据的访问。使用行业标准基准测试,我们证明了Prepared Scan将吞吐量提高到29+,并将网络带宽消耗降低到20+。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prepared scan: efficient retrieval of structured data from HBase
The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema flexibility. However, accessing structured data is costly due to such flexibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29+ and decreases network bandwidth consumption up to 20+.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tarski Handling bitcoin conflicts through a glimpse of structure Multi-CNN and decision tree based driving behavior evaluation Session details: WT - web technologies track Improving OR-PCA via smoothed spatially-consistent low-rank modeling for background subtraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1