同时在线采样所有，免费

Proceedings of the 16th International Workshop on Data Management on New Hardware Pub Date : 2020-06-14 DOI:10.1145/3399666.3399924

Altan Birler, Bernhard Radke, Thomas Neumann

{"title":"同时在线采样所有，免费","authors":"Altan Birler, Bernhard Radke, Thomas Neumann","doi":"10.1145/3399666.3399924","DOIUrl":null,"url":null,"abstract":"Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Concurrent online sampling for all, for free\",\"authors\":\"Altan Birler, Bernhard Radke, Thomas Neumann\",\"doi\":\"10.1145/3399666.3399924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.\",\"PeriodicalId\":256784,\"journal\":{\"name\":\"Proceedings of the 16th International Workshop on Data Management on New Hardware\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th International Workshop on Data Management on New Hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3399666.3399924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399666.3399924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

数据库系统依赖统计概要进行基数估计。用于估计目的的一种非常通用和强大的方法是保持数据的随机样本。然而，由于产生的随机访问模式，绘制现有数据集的随机样本是相当昂贵的，并且样本会随着时间的推移而过时。使用在线抽样更有吸引力，这样在任何时候都可以获得新鲜的样本，而不需要额外的数据访问。虽然从理论的角度来看，在线采样显然是优越的，但如何有效地将在线采样集成到具有高并发更新和查询负载的数据库系统中，目前还不清楚。我们引入了一种新颖的高度可扩展的在线采样策略，该策略允许以最小的开销维护样本。在多核共享内存场景中，我们可以用严格的新鲜度保证来换取性能的显著提升，这对于评估来说是理想的。我们表明，通过用我们的在线采样策略取代数据库系统中传统的周期性样本重建，我们在插入性能上的开销几乎为零，并且完全消除了样本构建所需的缓慢随机I/O。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Concurrent online sampling for all, for free

Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量

期刊最新文献

Accelerating re-pair compression using FPGAs Scalable and robust latches for database systems Efficient generation of machine code for query compilers nKV Empirical evaluation across multiple GPU-accelerated DBMSes