首页 > 最新文献

International Workshop on Data Management on New Hardware最新文献

英文 中文
Multi-core column-store parallelization under concurrent workload 并发工作负载下的多核列存储并行化
Pub Date : 2016-06-01 DOI: 10.1145/2933349.2933350
M. Gawade, M. Kersten, A. Simitsis
Columnar database systems, designed for an optimal OLAP workload performance, strive for maximum multi-core utilization under concurrent query executions. However, multi-core parallel plan generated for isolated execution leads to suboptimal performance during concurrent query execution. In this paper, we analyze the concurrent workload resource contention effects on multi-core plans using three intra-query parallelization techniques, static, adaptive, and cost model parallelization. We focus on a plan level comparison of selected TPC-H queries, using in-memory multi-core columnar systems. Excessive partitions in statically parallelized plans result into heavy L3 cache misses leading to memory contention, degrading query performance severely. Overall, adaptive plans show more robustness, less scheduling overheads, and an average 50% execution time improvement compared to statically parallelized plans, and cost model based plans.
列式数据库系统是为实现最佳OLAP工作负载性能而设计的,它在并发查询执行下力求实现最大的多核利用率。但是,为隔离执行生成的多核并行计划会导致并发查询执行期间的性能不理想。在本文中,我们使用三种查询内并行化技术,静态、自适应和成本模型并行化,分析并发工作负载资源争用对多核计划的影响。我们着重于使用内存中的多核列式系统对所选TPC-H查询进行计划级比较。在静态并行计划中,过多的分区会导致严重的L3缓存丢失,从而导致内存争用,严重降低查询性能。总的来说,与静态并行计划和基于成本模型的计划相比,自适应计划表现出更强的健壮性、更少的调度开销和平均50%的执行时间改进。
{"title":"Multi-core column-store parallelization under concurrent workload","authors":"M. Gawade, M. Kersten, A. Simitsis","doi":"10.1145/2933349.2933350","DOIUrl":"https://doi.org/10.1145/2933349.2933350","url":null,"abstract":"Columnar database systems, designed for an optimal OLAP workload performance, strive for maximum multi-core utilization under concurrent query executions. However, multi-core parallel plan generated for isolated execution leads to suboptimal performance during concurrent query execution.\u0000 In this paper, we analyze the concurrent workload resource contention effects on multi-core plans using three intra-query parallelization techniques, static, adaptive, and cost model parallelization. We focus on a plan level comparison of selected TPC-H queries, using in-memory multi-core columnar systems. Excessive partitions in statically parallelized plans result into heavy L3 cache misses leading to memory contention, degrading query performance severely. Overall, adaptive plans show more robustness, less scheduling overheads, and an average 50% execution time improvement compared to statically parallelized plans, and cost model based plans.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125629268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic fine-grained scheduling for energy-efficient main-memory queries 动态细粒度调度节能主存查询
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619229
Iraklis Psaroudakis, T. Kissinger, Danica Porobic, T. Ilsche, Erietta Liarou, Pınar Tözün, A. Ailamaki, Wolfgang Lehner
Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes. A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans.
电力和冷却成本是当今数据中心的最高成本,这使得提高能源效率至关重要。对于为各种计算设备供电的芯片来说,能源效率也是一个主要的设计要点。该领域的一个重要目标是能量比例,认为系统的功耗应该与其性能成正比。目前,源于移动设备芯片设计的服务器处理器的一个主要趋势是包含先进的电源管理技术,如动态电压频率缩放、时钟门控和涡轮模式。最近关于数据库管理系统能源效率的许多工作都集中在多机器和整个查询粒度的粗粒度电源管理上。然而,这些技术不能有效地适应当代工作负载频繁波动的行为。在本文中,我们认为数据库应该采用细粒度的方法,通过使用精确的硬件模型动态调度任务。这些模型可以通过在调度策略、并行性和内存访问策略的不同组合下校准操作符来生成。该模型可用于运行时的动态调度和电源管理,以提高整体能源效率。我们通过实验证明,对于基本的内存密集型数据库操作(如扫描),能源效率可以提高4倍。
{"title":"Dynamic fine-grained scheduling for energy-efficient main-memory queries","authors":"Iraklis Psaroudakis, T. Kissinger, Danica Porobic, T. Ilsche, Erietta Liarou, Pınar Tözün, A. Ailamaki, Wolfgang Lehner","doi":"10.1145/2619228.2619229","DOIUrl":"https://doi.org/10.1145/2619228.2619229","url":null,"abstract":"Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes.\u0000 A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125217795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery SOFORT:用于快速数据恢复的混合SCM-DRAM存储引擎
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619236
Ismail Oukid, Daniel Booss, Wolfgang Lehner, P. Bumbulis, Thomas Willhalm
Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.
存储类内存(SCM)具有显著提高数据库性能的潜力。这种潜力已经在吞吐量和响应时间方面得到了充分的证明[25,22]。在本文中,我们表明SCM也有潜力显著提高重启性能,这是传统主存数据库系统的一个缺点。我们提出了SOFORT,这是一种混合SCM- dram存储引擎,它通过消除传统的日志和以小增量更新持久数据来利用SCM的全部功能。我们可以实现独立于实例大小和交易量的几秒钟重启时间,而不会显著影响事务吞吐量。
{"title":"SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery","authors":"Ismail Oukid, Daniel Booss, Wolfgang Lehner, P. Bumbulis, Thomas Willhalm","doi":"10.1145/2619228.2619236","DOIUrl":"https://doi.org/10.1145/2619228.2619236","url":null,"abstract":"Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Online bit flip detection for in-memory B-trees on unreliable hardware 不可靠硬件上内存b树的在线位翻转检测
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619233
Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner
Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.
硬件厂商不断减小集成电路的特征尺寸,以获得更好的性能和能源效率。由于宇宙射线、低电压或散热,随着错误率的增加,硬件(包括处理器和内存)变得越来越不可靠。从数据库的角度来看,主存中的位翻转错误将成为现代内存数据库系统的主要挑战,这些系统将所有企业数据保存在易失的、不可靠的主存中。虽然现有的硬件错误控制技术,如ECC-DRAM能够检测和纠正内存错误,但它们的检测和纠正能力是有限的。此外,硬件纠错在获取成本、额外内存利用和延迟方面面临主要缺点。在本文中,我们认为通过结合上下文知识在适当的地方略微增加数据冗余已经显著提高了错误检测。我们使用b树作为一个广泛的索引结构作为一个例子,并提出了各种在线错误检测技术,从而提高了它的整体可靠性。在我们的实验中,我们发现,与在ECC-DRAM环境中运行的非弹性b - tree相比,我们的技术可以在更短的时间内在商用硬件上检测到更多错误。我们的技术可以很容易地进一步适用于其他数据结构,并且是弹性数据库系统的第一步,可以应对不可靠的硬件。
{"title":"Online bit flip detection for in-memory B-trees on unreliable hardware","authors":"Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2619228.2619233","DOIUrl":"https://doi.org/10.1145/2619228.2619233","url":null,"abstract":"Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"126 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123305429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases HICAMP位图:内存数据库的空间高效可更新位图索引
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619235
Bo Wang, Heiner Litz, D. Cheriton
Bitmap represents an efficient indexing structure for querying large amounts of data and is widely deployed in data-warehouse applications. While the size of a bitmap scales linearly with the number of rows in a table, due to its sparseness, it can be greatly reduced via compression based on run-length encoding. However, updating a compressed bitmap is expensive due to the encoding and decoding overheads, in particular, as re-compression can change the compressed sequence length and data layout. Due to this problem, bitmap indices only perform well for read-only workloads. In this paper, we propose a bitmap index structure which is both space-efficient and allows fast updates, by building on top of a smart memory model called HICAMP. As a consequence, our approach enables bitmap indices for workloads that exhibit high update ratios as in OLTP workloads. We also present a new multi-bit bitmap design which addresses the candidate checking problem. In our experiments, the HICAMP bitmap index demonstrates 3~12x reduction in size over B-tree and 8~30x over other commonly used indexing structures such as Red-Black tree, while supporting efficient updates simultaneously.
位图表示查询大量数据的有效索引结构,广泛部署在数据仓库应用程序中。虽然位图的大小与表中的行数呈线性关系,但由于其稀疏性,可以通过基于运行长度编码的压缩大大减少位图的大小。然而,由于编码和解码开销,更新压缩位图是昂贵的,特别是重新压缩可能会改变压缩序列长度和数据布局。由于这个问题,位图索引仅在只读工作负载下表现良好。在本文中,我们提出了一个位图索引结构,它既节省空间,又允许快速更新,通过建立在一个名为HICAMP的智能内存模型之上。因此,我们的方法可以为OLTP工作负载中表现出高更新比率的工作负载启用位图索引。我们还提出了一种新的多比特位图设计,解决了候选检测问题。在我们的实验中,HICAMP位图索引的大小比b树减少了3~12倍,比其他常用的索引结构(如红黑树)减少了8~30倍,同时支持高效的更新。
{"title":"HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases","authors":"Bo Wang, Heiner Litz, D. Cheriton","doi":"10.1145/2619228.2619235","DOIUrl":"https://doi.org/10.1145/2619228.2619235","url":null,"abstract":"Bitmap represents an efficient indexing structure for querying large amounts of data and is widely deployed in data-warehouse applications. While the size of a bitmap scales linearly with the number of rows in a table, due to its sparseness, it can be greatly reduced via compression based on run-length encoding. However, updating a compressed bitmap is expensive due to the encoding and decoding overheads, in particular, as re-compression can change the compressed sequence length and data layout. Due to this problem, bitmap indices only perform well for read-only workloads.\u0000 In this paper, we propose a bitmap index structure which is both space-efficient and allows fast updates, by building on top of a smart memory model called HICAMP. As a consequence, our approach enables bitmap indices for workloads that exhibit high update ratios as in OLTP workloads. We also present a new multi-bit bitmap design which addresses the candidate checking problem. In our experiments, the HICAMP bitmap index demonstrates 3~12x reduction in size over B-tree and 8~30x over other commonly used indexing structures such as Red-Black tree, while supporting efficient updates simultaneously.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116711944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Database cracking: fancy scan, not poor man's sort! 数据库破解:花哨的扫描,而不是穷人的那种!
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619232
H. Pirk, E. Petraki, Stratos Idreos, S. Manegold, M. Kersten
Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.
数据库破解是一种吸引人的自适应索引方法:在每个范围选择查询中,使用提供的谓词作为枢轴对数据进行分区。因此,数据库破解的核心是pivot分区。虽然与扫描一样,枢轴分区需要一次遍历数据,但由于CPU效率较低,它的成本往往要高得多。在本文中,我们深入研究了pivot分区CPU效率低的原因。基于这些发现,我们开发了一个具有更高(单线程)CPU效率的优化版本。我们还开发了许多受内存带宽有效约束的多线程实现。结合所有这些优化,我们实现的实现成本接近或优于各种系统上的普通扫描,从低端(低于300美元)桌面机器到高端(高于60,000美元)服务器。
{"title":"Database cracking: fancy scan, not poor man's sort!","authors":"H. Pirk, E. Petraki, Stratos Idreos, S. Manegold, M. Kersten","doi":"10.1145/2619228.2619232","DOIUrl":"https://doi.org/10.1145/2619228.2619232","url":null,"abstract":"Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116245309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Vectorized Bloom filters for advanced SIMD processors 矢量布隆过滤器先进的SIMD处理器
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619234
Orestis Polychroniou, K. A. Ross
Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.
分析是许多商业智能任务的核心。高级硬件特性促进了高效的查询执行,例如多核并行性、无共享的低延迟缓存和SIMD矢量指令。直到最近,主流硬件的SIMD功能才通过更宽的矢量和称为集的非连续负载得到增强。虽然分析dbms最大限度地减少了索引的使用,而支持基于顺序内存访问的扫描,但一些数据结构仍然至关重要。Bloom过滤器就是这样一个例子,它是根据元组在集合中的存在性来过滤元组的最有效的结构,当连接基数差别很大的表时,它的性能至关重要。我们引入了一种矢量化实现,用于探测基于集合的布隆过滤器,该集合消除了条件控制流并且独立于SIMD长度。我们的技术是通用的,可以用于加速其他数据库操作。我们的评估表明,当Bloom过滤器驻留在缓存中时,性能比标量代码有显著的提高,可以超过3倍。
{"title":"Vectorized Bloom filters for advanced SIMD processors","authors":"Orestis Polychroniou, K. A. Ross","doi":"10.1145/2619228.2619234","DOIUrl":"https://doi.org/10.1145/2619228.2619234","url":null,"abstract":"Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114324187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster! 具有异构意识的并行查询执行:在行驶更快的同时获得更好的里程!
Pub Date : 2014-06-23 DOI: 10.1145/2619228.2619230
Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, A. Kemper, Thomas Neumann
Physical and thermal restrictions hinder commensurate performance gains from the ever increasing transistor density. While multi-core scaling helped alleviate dimmed or dark silicon for some time, future processors will need to become more heterogeneous. To this end, single instruction set architecture (ISA) heterogeneous processors are a particularly interesting solution that combines multiple cores with the same ISA but asymmetric performance and power characteristics. These processors, however, are no free lunch for database systems. Mapping jobs to the core that fits best is notoriously hard for the operating system or a compiler. To achieve optimal performance and energy efficiency, heterogeneity needs to be exposed to the database system. In this paper, we provide a thorough study of parallelized core database operators and TPC-H query processing on a heterogeneous single-ISA multi-core architecture. Using these insights we design a heterogeneity-conscious job-to-core mapping approach for our high-performance main memory database system HyPer and show that it is indeed possible to get a better mileage while driving faster compared to static and operating-system-controlled mappings. Our approach improves the energy delay product of a TPC-H power run by 31% and up to over 60% for specific TPC-H queries.
物理和热的限制阻碍了晶体管密度不断增加所带来的相应的性能提升。虽然多核扩展在一段时间内帮助缓解了暗硅的问题,但未来的处理器将需要变得更加异构。为此,单指令集架构(ISA)异构处理器是一种特别有趣的解决方案,它将具有相同ISA但性能和功耗特性不对称的多个内核结合在一起。然而,这些处理器并不是数据库系统的免费午餐。对于操作系统或编译器来说,将作业映射到最适合的核心是出了名的困难。为了获得最佳性能和能源效率,需要向数据库系统暴露异构性。在本文中,我们对异构单isa多核架构上的并行核心数据库操作符和TPC-H查询处理进行了深入研究。利用这些见解,我们为我们的高性能主存数据库系统HyPer设计了一种具有异构意识的作业到核心映射方法,并表明与静态和操作系统控制的映射相比,它确实可以在行驶更快的同时获得更好的里程。我们的方法将TPC-H功率运行的能量延迟积提高了31%,对于特定的TPC-H查询,能量延迟积提高了60%以上。
{"title":"Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster!","authors":"Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, A. Kemper, Thomas Neumann","doi":"10.1145/2619228.2619230","DOIUrl":"https://doi.org/10.1145/2619228.2619230","url":null,"abstract":"Physical and thermal restrictions hinder commensurate performance gains from the ever increasing transistor density. While multi-core scaling helped alleviate dimmed or dark silicon for some time, future processors will need to become more heterogeneous. To this end, single instruction set architecture (ISA) heterogeneous processors are a particularly interesting solution that combines multiple cores with the same ISA but asymmetric performance and power characteristics. These processors, however, are no free lunch for database systems. Mapping jobs to the core that fits best is notoriously hard for the operating system or a compiler. To achieve optimal performance and energy efficiency, heterogeneity needs to be exposed to the database system.\u0000 In this paper, we provide a thorough study of parallelized core database operators and TPC-H query processing on a heterogeneous single-ISA multi-core architecture. Using these insights we design a heterogeneity-conscious job-to-core mapping approach for our high-performance main memory database system HyPer and show that it is indeed possible to get a better mileage while driving faster compared to static and operating-system-controlled mappings. Our approach improves the energy delay product of a TPC-H power run by 31% and up to over 60% for specific TPC-H queries.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124282998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Efficient GPU-based skyline computation 高效基于gpu的天际线计算
Pub Date : 2013-06-24 DOI: 10.1145/2485278.2485283
Kenneth S. Bøgh, I. Assent, Matteo Magnani
The skyline operator for multi-criteria search returns the most interesting points of a data set with respect to any monotone preference function. Existing work has almost exclusively focused on efficiently computing skylines on one or more CPUs, ignoring the high parallelism possible in GPUs. In this paper we investigate the challenges for efficient skyline algorithms that exploit the computational power of the GPU. We present a novel strategy for managing data transfer and memory for skylines using CPU and GPU. Our new sorting based data-parallel skyline algorithm is introduced and its properties are discussed. We demonstrate in a thorough experimental evaluation that this algorithm is faster than state-of-the-art sequential sorting based skyline algorithms and that it shows superior scalability.
用于多条件搜索的skyline操作符相对于任何单调偏好函数返回数据集中最有趣的点。现有的工作几乎完全专注于在一个或多个cpu上有效地计算天际线,忽略了gpu的高并行性。在本文中,我们研究了利用GPU计算能力的高效skyline算法所面临的挑战。我们提出了一种利用CPU和GPU管理天际线数据传输和内存的新策略。介绍了一种新的基于排序的数据并行天际线算法,并讨论了它的特性。我们在一个彻底的实验评估中证明,该算法比最先进的基于序列排序的天际线算法更快,并且显示出优越的可扩展性。
{"title":"Efficient GPU-based skyline computation","authors":"Kenneth S. Bøgh, I. Assent, Matteo Magnani","doi":"10.1145/2485278.2485283","DOIUrl":"https://doi.org/10.1145/2485278.2485283","url":null,"abstract":"The skyline operator for multi-criteria search returns the most interesting points of a data set with respect to any monotone preference function. Existing work has almost exclusively focused on efficiently computing skylines on one or more CPUs, ignoring the high parallelism possible in GPUs. In this paper we investigate the challenges for efficient skyline algorithms that exploit the computational power of the GPU. We present a novel strategy for managing data transfer and memory for skylines using CPU and GPU. Our new sorting based data-parallel skyline algorithm is introduced and its properties are discussed. We demonstrate in a thorough experimental evaluation that this algorithm is faster than state-of-the-art sequential sorting based skyline algorithms and that it shows superior scalability.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116825923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Peak performance: remote memory revisited 峰值性能:重新访问远程内存
Pub Date : 2013-06-24 DOI: 10.1145/2485278.2485287
H. Mühleisen, R. Goncalves, M. Kersten
Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance.
许多数据库系统都需要大量的快速存储。然而,规模经济限制了使用任意数量的内存扩展单个机器的效用。最近,零拷贝数据传输协议RDMA在低延迟和高吞吐量网络连接(如InfiniBand)上的广泛应用促使我们重新审视远程机器提供的内存的长期建议使用。在本文中,我们提出了一种不用操作系统就能利用远程内存的解决方案,并研究了它对数据库性能的影响。
{"title":"Peak performance: remote memory revisited","authors":"H. Mühleisen, R. Goncalves, M. Kersten","doi":"10.1145/2485278.2485287","DOIUrl":"https://doi.org/10.1145/2485278.2485287","url":null,"abstract":"Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124999856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1