首页 > 最新文献

International Workshop on Data Management on New Hardware最新文献

英文 中文
Frequent itemset mining on graphics processors 图形处理器上频繁的项集挖掘
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565702
Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, Qiong Luo
We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-generation graphics processing units (GPUs). Our implementations take advantage of the GPU's massively multi-threaded SIMD (Single Instruction, Multiple Data) architecture. Both implementations employ a bitmap data structure to exploit the GPU's SIMD parallelism and to accelerate the frequency counting operation. One implementation runs entirely on the GPU and eliminates intermediate data transfer between the GPU memory and the CPU memory. The other implementation employs both the GPU and the CPU for processing. It represents itemsets in a trie, and uses the CPU for trie traversing and incremental maintenance. Our preliminary results show that both implementations achieve a speedup of up to two orders of magnitude over optimized CPU Apriori implementations on a PC with an NVIDIA GTX 280 GPU and a quad-core CPU.
我们提出了两种利用新一代图形处理单元(gpu)的频繁项集挖掘(FIM)的高效Apriori实现。我们的实现利用了GPU的大规模多线程SIMD(单指令,多数据)架构。两种实现都采用位图数据结构来利用GPU的SIMD并行性并加速频率计数操作。一种实现完全在GPU上运行,消除了GPU内存和CPU内存之间的中间数据传输。另一种实现同时使用GPU和CPU进行处理。它表示树中的项集,并使用CPU进行树遍历和增量维护。我们的初步结果表明,在使用NVIDIA GTX 280 GPU和四核CPU的PC上,两种实现都比优化后的CPU Apriori实现的速度提高了两个数量级。
{"title":"Frequent itemset mining on graphics processors","authors":"Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, Qiong Luo","doi":"10.1145/1565694.1565702","DOIUrl":"https://doi.org/10.1145/1565694.1565702","url":null,"abstract":"We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-generation graphics processing units (GPUs). Our implementations take advantage of the GPU's massively multi-threaded SIMD (Single Instruction, Multiple Data) architecture. Both implementations employ a bitmap data structure to exploit the GPU's SIMD parallelism and to accelerate the frequency counting operation. One implementation runs entirely on the GPU and eliminates intermediate data transfer between the GPU memory and the CPU memory. The other implementation employs both the GPU and the CPU for processing. It represents itemsets in a trie, and uses the CPU for trie traversing and incremental maintenance. Our preliminary results show that both implementations achieve a speedup of up to two orders of magnitude over optimized CPU Apriori implementations on a PC with an NVIDIA GTX 280 GPU and a quad-core CPU.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117177829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
Join processing for flash SSDs: remembering past lessons 加入闪存ssd的处理:记住过去的教训
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565696
Jaeyoung Do, J. Patel
Flash solid state drives (SSDs) provide an attractive alternative to traditional magnetic hard disk drives (HDDs) for DBMS applications. Naturally there is substantial interest in redesigning critical database internals, such as join algorithms, for flash SSDs. However, we must carefully consider the lessons that we have learnt from over three decades of designing and tuning algorithms for magnetic HDD-based systems, so that we continue to reuse techniques that worked for magnetic HDDs and also work with flash SSDs. The focus of this paper is on recalling some of these lessons in the context of ad hoc join algorithms. Based on an actual implementation of four common ad hoc join algorithms on both a magnetic HDD and a flash SSD, we show that many of the "surprising" results from magnetic HDD-based join methods also hold for flash SSDs. These results include the superiority of block nested loops join over sort-merge join and Grace hash join in many cases, and the benefits of blocked I/Os. In addition, we find that simply looking at the I/O costs when designing new flash SSD join algorithms can be problematic, as the CPU cost is often a bigger component of the total join cost with SSDs. We hope that these results provide insights and better starting points for researchers designing new join algorithms for flash SSDs.
闪存固态驱动器(ssd)为DBMS应用程序提供了传统磁性硬盘驱动器(hdd)的有吸引力的替代方案。当然,对于为闪存ssd重新设计关键的数据库内部,例如连接算法,有很大的兴趣。然而,我们必须仔细考虑我们从30多年来为基于磁性hdd的系统设计和调整算法中学到的经验教训,以便我们继续重用适用于磁性hdd和闪存ssd的技术。本文的重点是回顾在特设连接算法上下文中的一些经验教训。通过在磁性HDD和闪存SSD上实际实现四种常见的临时连接算法,我们发现,基于磁性HDD的连接方法的许多“令人惊讶”的结果也适用于闪存SSD。这些结果包括在许多情况下块嵌套循环连接优于排序合并连接和Grace散列连接,以及阻塞I/ o的好处。此外,我们发现,在设计新的闪存SSD连接算法时,仅仅考虑I/O成本可能会有问题,因为CPU成本通常是SSD总连接成本中较大的组成部分。我们希望这些结果为研究人员设计闪存固态硬盘的新连接算法提供见解和更好的起点。
{"title":"Join processing for flash SSDs: remembering past lessons","authors":"Jaeyoung Do, J. Patel","doi":"10.1145/1565694.1565696","DOIUrl":"https://doi.org/10.1145/1565694.1565696","url":null,"abstract":"Flash solid state drives (SSDs) provide an attractive alternative to traditional magnetic hard disk drives (HDDs) for DBMS applications. Naturally there is substantial interest in redesigning critical database internals, such as join algorithms, for flash SSDs. However, we must carefully consider the lessons that we have learnt from over three decades of designing and tuning algorithms for magnetic HDD-based systems, so that we continue to reuse techniques that worked for magnetic HDDs and also work with flash SSDs.\u0000 The focus of this paper is on recalling some of these lessons in the context of ad hoc join algorithms. Based on an actual implementation of four common ad hoc join algorithms on both a magnetic HDD and a flash SSD, we show that many of the \"surprising\" results from magnetic HDD-based join methods also hold for flash SSDs. These results include the superiority of block nested loops join over sort-merge join and Grace hash join in many cases, and the benefits of blocked I/Os. In addition, we find that simply looking at the I/O costs when designing new flash SSD join algorithms can be problematic, as the CPU cost is often a bigger component of the total join cost with SSDs. We hope that these results provide insights and better starting points for researchers designing new join algorithms for flash SSDs.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127138476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Cache-conscious buffering for database operators with state 具有状态的数据库操作符的缓存敏感缓冲
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565704
J. Cieslewicz, William Mee, K. A. Ross
Database processes must be cache-efficient to effectively utilize modern hardware. In this paper, we analyze the importance of temporal locality and the resultant cache behavior in scheduling database operators for in-memory, block oriented query processing. We demonstrate how the overall performance of a workload of multiple database operators is strongly dependent on how they are interleaved with each other. Longer time slices combined with temporal locality within an operator amortize the effects of the initial compulsory cache misses needed to load the operator's state, such as a hash table, into the cache. Though running an operator to completion over all of its input results in the greatest amortization of cache misses, this is typically infeasible because of the large intermediate storage requirement to materialize all input tuples to an operator. We show experimentally that good cache performance can be obtained with smaller buffers whose size is determined at runtime. We demonstrate a low-overhead method of runtime cache miss sampling using hardware performance counters. Our evaluation considers two common database operators with state: aggregation and hash join. Sampling reveals operator temporal locality and cache miss behavior, and we use those characteristics to choose an appropriate input buffer/block size. The calculated buffer size balances cache miss amortization with buffer memory requirements.
数据库进程必须具有缓存效率才能有效地利用现代硬件。在本文中,我们分析了时间局部性的重要性和由此产生的缓存行为在调度数据库操作符在内存中,面向块的查询处理。我们将演示多个数据库操作符的工作负载的总体性能如何强烈依赖于它们如何相互交错。较长的时间片与操作符内的时间局域性相结合,可以分摊将操作符状态(如哈希表)加载到缓存中所需的初始强制缓存缺失的影响。虽然运行一个操作符直到完成它的所有输入会导致最大程度的缓存丢失分摊,但这通常是不可行的,因为将操作符的所有输入元组具体化需要大量的中间存储空间。我们通过实验证明,在运行时确定较小的缓冲区大小可以获得良好的缓存性能。我们演示了一种使用硬件性能计数器的低开销的运行时缓存缺失采样方法。我们的评估考虑了两种常见的数据库状态操作符:聚合和散列连接。采样揭示了操作员的时间局部性和缓存缺失行为,我们使用这些特征来选择合适的输入缓冲区/块大小。计算的缓冲区大小平衡缓存丢失分摊与缓冲区内存需求。
{"title":"Cache-conscious buffering for database operators with state","authors":"J. Cieslewicz, William Mee, K. A. Ross","doi":"10.1145/1565694.1565704","DOIUrl":"https://doi.org/10.1145/1565694.1565704","url":null,"abstract":"Database processes must be cache-efficient to effectively utilize modern hardware. In this paper, we analyze the importance of temporal locality and the resultant cache behavior in scheduling database operators for in-memory, block oriented query processing. We demonstrate how the overall performance of a workload of multiple database operators is strongly dependent on how they are interleaved with each other. Longer time slices combined with temporal locality within an operator amortize the effects of the initial compulsory cache misses needed to load the operator's state, such as a hash table, into the cache. Though running an operator to completion over all of its input results in the greatest amortization of cache misses, this is typically infeasible because of the large intermediate storage requirement to materialize all input tuples to an operator. We show experimentally that good cache performance can be obtained with smaller buffers whose size is determined at runtime. We demonstrate a low-overhead method of runtime cache miss sampling using hardware performance counters. Our evaluation considers two common database operators with state: aggregation and hash join. Sampling reveals operator temporal locality and cache miss behavior, and we use those characteristics to choose an appropriate input buffer/block size. The calculated buffer size balances cache miss amortization with buffer memory requirements.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133052962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
k-ary search on modern processors 在现代处理器上的K-ary搜索
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565705
B. Schlegel, Rainer Gemulla, Wolfgang Lehner
This paper presents novel tree-based search algorithms that exploit the SIMD instructions found in virtually all modern processors. The algorithms are a natural extension of binary search: While binary search performs one comparison at each iteration, thereby cutting the search space in two halves, our algorithms perform k comparisons at a time and thus cut the search space into k pieces. On traditional processors, this so-called k-ary search procedure is not beneficial because the cost increase per iteration offsets the cost reduction due to the reduced number of iterations. On modern processors, however, multiple scalar operations can be executed simultaneously, which makes k-ary search attractive. In this paper, we provide two different search algorithms that differ in terms of efficiency and memory access patterns. Both algorithms are first described in a platform independent way and then evaluated on various state-of-the-art processors. Our experiments suggest that k-ary search provides significant performance improvements (factor two and more) on most platforms.
本文提出了一种新的基于树的搜索算法,该算法利用了几乎所有现代处理器中发现的SIMD指令。这些算法是二分搜索的自然扩展:二分搜索在每次迭代中执行一次比较,从而将搜索空间分成两部分,而我们的算法一次执行k次比较,从而将搜索空间分成k个部分。在传统处理器上,这种所谓的k-ary搜索过程是无益的,因为每次迭代的成本增加抵消了由于迭代次数减少而导致的成本降低。然而,在现代处理器上,可以同时执行多个标量操作,这使得k-ary搜索很有吸引力。在本文中,我们提供了两种不同的搜索算法,它们在效率和内存访问模式方面有所不同。这两种算法首先以平台独立的方式进行描述,然后在各种最先进的处理器上进行评估。我们的实验表明,k-ary搜索在大多数平台上提供了显著的性能改进(因子2或更多)。
{"title":"k-ary search on modern processors","authors":"B. Schlegel, Rainer Gemulla, Wolfgang Lehner","doi":"10.1145/1565694.1565705","DOIUrl":"https://doi.org/10.1145/1565694.1565705","url":null,"abstract":"This paper presents novel tree-based search algorithms that exploit the SIMD instructions found in virtually all modern processors. The algorithms are a natural extension of binary search: While binary search performs one comparison at each iteration, thereby cutting the search space in two halves, our algorithms perform k comparisons at a time and thus cut the search space into k pieces. On traditional processors, this so-called k-ary search procedure is not beneficial because the cost increase per iteration offsets the cost reduction due to the reduced number of iterations. On modern processors, however, multiple scalar operations can be executed simultaneously, which makes k-ary search attractive. In this paper, we provide two different search algorithms that differ in terms of efficiency and memory access patterns. Both algorithms are first described in a platform independent way and then evaluated on various state-of-the-art processors. Our experiments suggest that k-ary search provides significant performance improvements (factor two and more) on most platforms.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128830850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Spinning relations: high-speed networks for distributed join processing 旋转关系:用于分布式连接处理的高速网络
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565701
P. Frey, R. Goncalves, M. Kersten, J. Teubner
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be replaced by a scheme that takes advantage of the bandwidth and low latency offered by such interconnects. We illustrate this phenomenon with cyclo-join, an efficient join algorithm based on continuously pumping data through a ring-structured network. Our approach is capable of exploiting the resources of all CPUs and distributed main-memory available in the network for processing queries of arbitrary shape and datasets of arbitrary size.
通过利用现代网络硬件(支持rdma的网卡),我们可以显著地改变分布式数据库处理的优先级。避免网络流量的复杂和复杂的机制可以被利用这种互连提供的带宽和低延迟的方案所取代。我们用环连接来说明这种现象,环连接是一种高效的连接算法,它基于在环结构网络中连续泵送数据。我们的方法能够利用网络中可用的所有cpu和分布式主存的资源来处理任意形状的查询和任意大小的数据集。
{"title":"Spinning relations: high-speed networks for distributed join processing","authors":"P. Frey, R. Goncalves, M. Kersten, J. Teubner","doi":"10.1145/1565694.1565701","DOIUrl":"https://doi.org/10.1145/1565694.1565701","url":null,"abstract":"By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be replaced by a scheme that takes advantage of the bandwidth and low latency offered by such interconnects.\u0000 We illustrate this phenomenon with cyclo-join, an efficient join algorithm based on continuously pumping data through a ring-structured network. Our approach is capable of exploiting the resources of all CPUs and distributed main-memory available in the network for processing queries of arbitrary shape and datasets of arbitrary size.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121404270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Evaluating and repairing write performance on flash devices 评估和修复闪存设备的写性能
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565697
R. Stoica, Manos Athanassoulis, Ryan Johnson, A. Ailamaki
In the last few years NAND flash storage has become more and more popular as price per GB and capacity both improve at exponential rates. Flash memory offers significant benefits compared to magnetic hard disk drives (HDDs) and DBMSs are highly likely to use flash as a general storage backend, either alone or in heterogeneous storage solutions with HDDs. Flash devices, however, respond quite differently than HDDs for common access patterns, and recent research shows a strong asymmetry between read and write performance. Moreover, flash storage devices behave unpredictably, showing a high dependence on previous IO history and usage patterns. In this paper we investigate how a DBMS can overcome these issues to take full advantage of flash memory as persistent storage. We propose new a flash aware data layout --- append and pack --- which stabilizes device performance by eliminating random writes. We assess the impact of append and pack on OLTP workload performance using both an analytical model and micro-benchmarks, and our results suggest that significant improvements can be achieved for real workloads.
在过去的几年里,随着每GB的价格和容量都以指数级的速度提高,NAND闪存变得越来越受欢迎。与磁性硬盘驱动器(hdd)相比,闪存提供了显著的优势,dbms很可能使用闪存作为通用存储后端,无论是单独使用还是在带有hdd的异构存储解决方案中使用。然而,对于常见的访问模式,闪存设备的响应与hdd完全不同,最近的研究表明读写性能之间存在强烈的不对称。此外,闪存设备的行为不可预测,显示出对以前的IO历史和使用模式的高度依赖。在本文中,我们研究了DBMS如何克服这些问题,以充分利用闪存作为持久存储。我们提出了一种新的闪存感知数据布局-追加和打包-通过消除随机写入来稳定设备性能。我们使用分析模型和微基准测试来评估追加和打包对OLTP工作负载性能的影响,我们的结果表明,对于实际工作负载可以实现显著的改进。
{"title":"Evaluating and repairing write performance on flash devices","authors":"R. Stoica, Manos Athanassoulis, Ryan Johnson, A. Ailamaki","doi":"10.1145/1565694.1565697","DOIUrl":"https://doi.org/10.1145/1565694.1565697","url":null,"abstract":"In the last few years NAND flash storage has become more and more popular as price per GB and capacity both improve at exponential rates. Flash memory offers significant benefits compared to magnetic hard disk drives (HDDs) and DBMSs are highly likely to use flash as a general storage backend, either alone or in heterogeneous storage solutions with HDDs. Flash devices, however, respond quite differently than HDDs for common access patterns, and recent research shows a strong asymmetry between read and write performance. Moreover, flash storage devices behave unpredictably, showing a high dependence on previous IO history and usage patterns.\u0000 In this paper we investigate how a DBMS can overcome these issues to take full advantage of flash memory as persistent storage. We propose new a flash aware data layout --- append and pack --- which stabilizes device performance by eliminating random writes. We assess the impact of append and pack on OLTP workload performance using both an analytical model and micro-benchmarks, and our results suggest that significant improvements can be achieved for real workloads.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126472494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
A new look at the roles of spinning and blocking 重新审视旋转和阻挡的作用
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565700
Ryan Johnson, Manos Athanassoulis, R. Stoica, A. Ailamaki
Database engines face growing scalability challenges as core counts exponentially increase each processor generation, and the efficiency of synchronization primitives used to protect internal data structures is a crucial factor in overall database performance. The trade-offs between different implementation approaches for these primitives shift significantly with increasing degrees of available hardware parallelism. Blocking synchronization, which has long been the favored approach in database systems, becomes increasingly unattractive as growing core counts expose its bottlenecks. Spinning implementations improve peak system throughput by a factor of 2x or more for 64 hardware contexts, but suffer from poor performance under load. In this paper we analyze the shifting trade-off between spinning and blocking synchronization, and observe that the trade-off can be simplified by isolating the load control aspects of contention management and treating the two problems separately: spinning-based contention management and blocking-based load control. We then present a proof of concept implementation that, for high concurrency, matches or exceeds the performance of both user-level spin-locks and the pthread mutex under a wide range of load factors.
随着每一代处理器的核心数量呈指数级增长,数据库引擎面临着日益增长的可伸缩性挑战,而用于保护内部数据结构的同步原语的效率是影响整体数据库性能的一个关键因素。这些原语的不同实现方法之间的权衡随着可用硬件并行度的增加而显著变化。阻塞同步长期以来一直是数据库系统中最受欢迎的方法,但随着核心数量的增长暴露出其瓶颈,阻塞同步变得越来越没有吸引力。对于64个硬件环境,旋转实现将峰值系统吞吐量提高了2倍或更多,但在负载下的性能很差。在本文中,我们分析了旋转同步和阻塞同步之间的转移权衡,并观察到可以通过隔离争用管理的负载控制方面并分别处理两个问题来简化权衡:基于旋转的争用管理和基于阻塞的负载控制。然后,我们提出了一个概念验证实现,对于高并发性,在广泛的负载因子下,匹配或超过用户级自旋锁和pthread互斥锁的性能。
{"title":"A new look at the roles of spinning and blocking","authors":"Ryan Johnson, Manos Athanassoulis, R. Stoica, A. Ailamaki","doi":"10.1145/1565694.1565700","DOIUrl":"https://doi.org/10.1145/1565694.1565700","url":null,"abstract":"Database engines face growing scalability challenges as core counts exponentially increase each processor generation, and the efficiency of synchronization primitives used to protect internal data structures is a crucial factor in overall database performance. The trade-offs between different implementation approaches for these primitives shift significantly with increasing degrees of available hardware parallelism. Blocking synchronization, which has long been the favored approach in database systems, becomes increasingly unattractive as growing core counts expose its bottlenecks. Spinning implementations improve peak system throughput by a factor of 2x or more for 64 hardware contexts, but suffer from poor performance under load.\u0000 In this paper we analyze the shifting trade-off between spinning and blocking synchronization, and observe that the trade-off can be simplified by isolating the load control aspects of contention management and treating the two problems separately: spinning-based contention management and blocking-based load control. We then present a proof of concept implementation that, for high concurrency, matches or exceeds the performance of both user-level spin-locks and the pthread mutex under a wide range of load factors.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121418465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
CFDC: a flash-aware replacement policy for database buffer management CFDC:用于数据库缓冲区管理的闪存感知替换策略
Pub Date : 2009-06-28 DOI: 10.1145/1565694.1565698
Y. Ou, T. Härder, Peiquan Jin
Flash disks are becoming an important alternative to conventional magnetic disks. Although accessed through the same interface by applications, flash disks have some distinguished characteristics that make it necessary to reconsider the design of the software to leverage their performance potential. This paper addresses this problem at the buffer management layer of database systems and proposes a flash-aware replacement policy that significantly improves and outperforms one of the previous proposals in this area.
闪存盘正在成为传统磁盘的重要替代品。尽管应用程序通过相同的接口访问闪存盘,但闪存盘有一些不同的特性,这使得有必要重新考虑软件的设计,以利用其性能潜力。本文在数据库系统的缓冲区管理层解决了这个问题,并提出了一个flash感知的替换策略,该策略显著改进并优于该领域的先前建议之一。
{"title":"CFDC: a flash-aware replacement policy for database buffer management","authors":"Y. Ou, T. Härder, Peiquan Jin","doi":"10.1145/1565694.1565698","DOIUrl":"https://doi.org/10.1145/1565694.1565698","url":null,"abstract":"Flash disks are becoming an important alternative to conventional magnetic disks. Although accessed through the same interface by applications, flash disks have some distinguished characteristics that make it necessary to reconsider the design of the software to leverage their performance potential. This paper addresses this problem at the buffer management layer of database systems and proposes a flash-aware replacement policy that significantly improves and outperforms one of the previous proposals in this area.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128939698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Data partitioning on chip multiprocessors 芯片多处理器上的数据分区
Pub Date : 2008-06-13 DOI: 10.1145/1457150.1457156
J. Cieslewicz, K. A. Ross
Partitioning is a key database task. In this paper we explore partitioning performance on a chip multiprocessor (CMP) that provides a relatively high degree of on-chip thread-level parallelism. It is therefore important to implement the partitioning algorithm to take advantage of the CMP's parallel execution resources. We identify the coordination of writing partition output as the main challenge in a parallel partitioning implementation and evaluate four techniques for enabling parallel partitioning. We confirm previous work in single threaded partitioning that finds L2 cache misses and translation lookaside buffer misses to be important performance issues, but we now add the management of concurrent threads to this analysis.
分区是一项关键的数据库任务。在本文中,我们探讨了芯片多处理器(CMP)上的分区性能,它提供了相对较高的片上线程级并行性。因此,实现分区算法以利用CMP的并行执行资源是很重要的。我们将写入分区输出的协调确定为并行分区实现中的主要挑战,并评估了实现并行分区的四种技术。我们确认了之前在单线程分区中发现L2缓存缺失和转换暂置缓冲区缺失是重要的性能问题的工作,但是我们现在将并发线程的管理添加到这个分析中。
{"title":"Data partitioning on chip multiprocessors","authors":"J. Cieslewicz, K. A. Ross","doi":"10.1145/1457150.1457156","DOIUrl":"https://doi.org/10.1145/1457150.1457156","url":null,"abstract":"Partitioning is a key database task. In this paper we explore partitioning performance on a chip multiprocessor (CMP) that provides a relatively high degree of on-chip thread-level parallelism. It is therefore important to implement the partitioning algorithm to take advantage of the CMP's parallel execution resources. We identify the coordination of writing partition output as the main challenge in a parallel partitioning implementation and evaluate four techniques for enabling parallel partitioning. We confirm previous work in single threaded partitioning that finds L2 cache misses and translation lookaside buffer misses to be important performance issues, but we now add the management of concurrent threads to this analysis.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121829553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Critical sections: re-emerging scalability concerns for database storage engines 关键部分:重新出现的数据库存储引擎的可伸缩性问题
Pub Date : 2008-06-13 DOI: 10.1145/1457150.1457157
Ryan Johnson, I. Pandis, A. Ailamaki
Critical sections in database storage engines impact performance and scalability more as the number of hardware contexts per chip continues to grow exponentially. With enough threads in the system, some critical section will eventually become a bottleneck. While algorithmic changes are the only long-term solution, they tend to be complex and costly to develop. Meanwhile, changes in enforcement of critical sections require much less effort. We observe that, in practice, many critical sections are so short that enforcing them contributes a significant or even dominating fraction of their total cost and tuning them directly improves database system performance. The contribution of this paper is two-fold: we (a) make a thorough performance comparison of the various synchronization primitives in the database system developer's toolbox and highlight the best ones for practical use, and (b) show that properly enforcing critical sections can delay the need to make algorithmic changes for a target number of processors.
随着每个芯片的硬件上下文数量呈指数级增长,数据库存储引擎中的关键区域对性能和可伸缩性的影响越来越大。在系统中有足够的线程时,一些临界区最终会成为瓶颈。虽然改变算法是唯一的长期解决方案,但它们往往很复杂,开发成本也很高。与此同时,改变临界区执行需要的努力要少得多。我们观察到,在实践中,许多关键部分是如此之短,以至于强制执行它们会对它们的总成本造成很大的甚至是主要的影响,而对它们进行调优会直接提高数据库系统的性能。本文的贡献是双重的:我们(a)对数据库系统开发人员工具箱中的各种同步原语进行了彻底的性能比较,并突出了实际使用的最佳原语,并且(b)表明适当地执行临界区可以延迟对目标数量的处理器进行算法更改的需要。
{"title":"Critical sections: re-emerging scalability concerns for database storage engines","authors":"Ryan Johnson, I. Pandis, A. Ailamaki","doi":"10.1145/1457150.1457157","DOIUrl":"https://doi.org/10.1145/1457150.1457157","url":null,"abstract":"Critical sections in database storage engines impact performance and scalability more as the number of hardware contexts per chip continues to grow exponentially. With enough threads in the system, some critical section will eventually become a bottleneck. While algorithmic changes are the only long-term solution, they tend to be complex and costly to develop. Meanwhile, changes in enforcement of critical sections require much less effort. We observe that, in practice, many critical sections are so short that enforcing them contributes a significant or even dominating fraction of their total cost and tuning them directly improves database system performance. The contribution of this paper is two-fold: we (a) make a thorough performance comparison of the various synchronization primitives in the database system developer's toolbox and highlight the best ones for practical use, and (b) show that properly enforcing critical sections can delay the need to make algorithmic changes for a target number of processors.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1