首页 > 最新文献

International Workshop on Data Management on New Hardware最新文献

英文 中文
Vectorization vs. compilation in query execution 查询执行中的向量化与编译
Pub Date : 2011-06-13 DOI: 10.1145/1995441.1995446
Juliusz Sompolski, M. Zukowski, P. Boncz
Compiling database queries into executable (sub-) programs provides substantial benefits comparing to traditional interpreted execution. Many of these benefits, such as reduced interpretation overhead, better instruction code locality, and providing opportunities to use SIMD instructions, have previously been provided by redesigning query processors to use a vectorized execution model. In this paper, we try to shed light on the question of how state-of-the-art compilation strategies relate to vectorized execution for analytical database workloads on modern CPUs. For this purpose, we carefully investigate the behavior of vectorized and compiled strategies inside the Ingres VectorWise database system in three use cases: Project, Select and Hash Join. One of the findings is that compilation should always be combined with block-wise query execution. Another contribution is identifying three cases where "loop-compilation" strategies are inferior to vectorized execution. As such, a careful merging of these two strategies is proposed for optimal performance: either by incorporating vectorized execution principles into compiled query plans or using query compilation to create building blocks for vectorized processing.
与传统的解释式执行相比,将数据库查询编译为可执行(子)程序提供了实质性的好处。以前,通过重新设计查询处理器以使用向量化执行模型,可以提供许多好处,例如减少解释开销、更好的指令代码局部性以及提供使用SIMD指令的机会。在本文中,我们试图阐明最先进的编译策略如何与现代cpu上分析数据库工作负载的矢量化执行相关的问题。为此,我们在三个用例中仔细研究了Ingres VectorWise数据库系统中矢量化和编译策略的行为:项目、选择和哈希连接。其中一个发现是,编译应该始终与块查询执行相结合。另一个贡献是确定了“循环编译”策略不如矢量化执行的三种情况。因此,建议仔细合并这两种策略以获得最佳性能:将向量化执行原则合并到已编译的查询计划中,或者使用查询编译为向量化处理创建构建块。
{"title":"Vectorization vs. compilation in query execution","authors":"Juliusz Sompolski, M. Zukowski, P. Boncz","doi":"10.1145/1995441.1995446","DOIUrl":"https://doi.org/10.1145/1995441.1995446","url":null,"abstract":"Compiling database queries into executable (sub-) programs provides substantial benefits comparing to traditional interpreted execution. Many of these benefits, such as reduced interpretation overhead, better instruction code locality, and providing opportunities to use SIMD instructions, have previously been provided by redesigning query processors to use a vectorized execution model. In this paper, we try to shed light on the question of how state-of-the-art compilation strategies relate to vectorized execution for analytical database workloads on modern CPUs. For this purpose, we carefully investigate the behavior of vectorized and compiled strategies inside the Ingres VectorWise database system in three use cases: Project, Select and Hash Join. One of the findings is that compilation should always be combined with block-wise query execution. Another contribution is identifying three cases where \"loop-compilation\" strategies are inferior to vectorized execution. As such, a careful merging of these two strategies is proposed for optimal performance: either by incorporating vectorized execution principles into compiled query plans or using query compilation to create building blocks for vectorized processing.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134372689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
QMD: exploiting flash for energy efficient disk arrays QMD:利用闪存实现节能磁盘阵列
Pub Date : 2011-06-13 DOI: 10.1145/1995441.1995447
Sean M. Snyder, Shimin Chen, Panos K. Chrysanthis, Alexandros Labrinidis
Energy consumption for computing devices in general and for data centers in particular is receiving increasingly high attention, both because of the increasing ubiquity of computing and also because of increasing energy prices. In this work, we propose QMD (Quasi Mirrored Disks) that exploit flash as a write buffer to complement RAID systems consisting of hard disks. QMD along with partial on-line mirrors, are a first step towards energy proportionality which is seen as the "holy grail" of energy-efficient system design. QMD exhibits significant energy savings of up 31%, as per our evaluation study using real workloads.
计算设备的总体能耗,特别是数据中心的能耗正受到越来越高的关注,这既是因为计算的日益普及,也是因为能源价格的不断上涨。在这项工作中,我们提出了QMD(准镜像磁盘),它利用闪存作为写入缓冲区来补充由硬盘组成的RAID系统。QMD与部分在线镜子一起,是朝着能源比例迈出的第一步,被视为节能系统设计的“圣杯”。根据我们使用实际工作负载的评估研究,QMD显示出高达31%的显著节能。
{"title":"QMD: exploiting flash for energy efficient disk arrays","authors":"Sean M. Snyder, Shimin Chen, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/1995441.1995447","DOIUrl":"https://doi.org/10.1145/1995441.1995447","url":null,"abstract":"Energy consumption for computing devices in general and for data centers in particular is receiving increasingly high attention, both because of the increasing ubiquity of computing and also because of increasing energy prices. In this work, we propose QMD (Quasi Mirrored Disks) that exploit flash as a write buffer to complement RAID systems consisting of hard disks. QMD along with partial on-line mirrors, are a first step towards energy proportionality which is seen as the \"holy grail\" of energy-efficient system design. QMD exhibits significant energy savings of up 31%, as per our evaluation study using real workloads.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115404110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards highly parallel event processing through reconfigurable hardware 通过可重构硬件实现高度并行的事件处理
Pub Date : 2011-06-13 DOI: 10.1145/1995441.1995445
Mohammad Sadoghi, Harsh V. P. Singh, H. Jacobsen
We present fpga-ToPSS (Toronto Publish/Subscribe System), an efficient event processing platform to support high-frequency and low-latency event matching. fpga-ToPSS is built over reconfigurable hardware---FPGAs---to achieve line-rate processing by exploring various degrees of parallelism. Furthermore, each of our proposed FPGA-based designs is geared towards a unique application requirement, such as flexibility, adaptability, scalability, or pure performance, such that each solution is specifically optimized to attain a high level of parallelism. Therefore, each solution is formulated as a design trade-off between the degree of parallelism versus the desired application requirement. Moreover, our event processing engine supports Boolean expression matching with an expressive predicate language applicable to a wide range of applications including real-time data analysis, algorithmic trading, targeted advertisement, and (complex) event processing.
我们提出了fpga-ToPSS(多伦多发布/订阅系统),一个高效的事件处理平台,支持高频率和低延迟的事件匹配。fpga-ToPSS是建立在可重构硬件——fpga——通过探索不同程度的并行性来实现线速率处理。此外,我们提出的每个基于fpga的设计都针对独特的应用需求,例如灵活性,适应性,可扩展性或纯粹的性能,因此每个解决方案都经过专门优化以获得高水平的并行性。因此,每个解决方案都是在并行度与期望的应用程序需求之间的设计权衡。此外,我们的事件处理引擎支持布尔表达式匹配与一个表达谓词语言适用于广泛的应用程序,包括实时数据分析,算法交易,目标广告,和(复杂的)事件处理。
{"title":"Towards highly parallel event processing through reconfigurable hardware","authors":"Mohammad Sadoghi, Harsh V. P. Singh, H. Jacobsen","doi":"10.1145/1995441.1995445","DOIUrl":"https://doi.org/10.1145/1995441.1995445","url":null,"abstract":"We present fpga-ToPSS (Toronto Publish/Subscribe System), an efficient event processing platform to support high-frequency and low-latency event matching. fpga-ToPSS is built over reconfigurable hardware---FPGAs---to achieve line-rate processing by exploring various degrees of parallelism. Furthermore, each of our proposed FPGA-based designs is geared towards a unique application requirement, such as flexibility, adaptability, scalability, or pure performance, such that each solution is specifically optimized to attain a high level of parallelism. Therefore, each solution is formulated as a design trade-off between the degree of parallelism versus the desired application requirement. Moreover, our event processing engine supports Boolean expression matching with an expressive predicate language applicable to a wide range of applications including real-time data analysis, algorithmic trading, targeted advertisement, and (complex) event processing.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"384 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133510658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The effects of virtualization on main memory systems 虚拟化对主内存系统的影响
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869395
M. Grund, J. Schaffner, Jens Krüger, Jan Brunnert, A. Zeier
Virtualization is mainly employed for increasing the utilization of a lightly-loaded system by consolidation, but also to ease the administration based on the possibility to rapidly provision or migrate virtual machines. These facilities are crucial for efficiently managing large data centers. At the same time, modern hardware --- such as Intel's Nehalem microarchitecure --- change critical assumptions about performance bottlenecks and software systems explicitly exploiting the underlying hardware --- such as main memory databases --- gain increasing momentum. In this paper, we address the question of how these specialized software systems perform in a virtualized environment. To do so, we present a set of experiments looking at several different variants of in-memory databases: The MonetDB Calibrator, a fine-grained hybrid row/column in-memory database running an OLTP workload, and an in-memory column store database running a multi-user OLAP workload. We examine how memory management in virtual machine monitors affects these three classes of applications. For the multi-user OLAP experiment we also experimentally compare a virtualized Nehalem server to one of its predecessors. We show that saturation of the memory bus is a major limiting factor but is much less impactful on the new architecture.
虚拟化主要用于通过整合来提高轻负载系统的利用率,但也用于基于快速配置或迁移虚拟机的可能性来简化管理。这些设施对于有效管理大型数据中心至关重要。与此同时,现代硬件(如英特尔的Nehalem微架构)改变了关于性能瓶颈的关键假设,而明确利用底层硬件(如主内存数据库)的软件系统获得了越来越大的动力。在本文中,我们讨论这些专门的软件系统如何在虚拟化环境中执行的问题。为此,我们提供了一组研究内存数据库几种不同变体的实验:MonetDB Calibrator,一个运行OLTP工作负载的细粒度混合行/列内存数据库,以及一个运行多用户OLAP工作负载的内存列存储数据库。我们将研究虚拟机监视器中的内存管理如何影响这三类应用程序。对于多用户OLAP实验,我们还实验性地将虚拟化的Nehalem服务器与其前身之一进行了比较。我们表明,内存总线的饱和是一个主要的限制因素,但对新架构的影响要小得多。
{"title":"The effects of virtualization on main memory systems","authors":"M. Grund, J. Schaffner, Jens Krüger, Jan Brunnert, A. Zeier","doi":"10.1145/1869389.1869395","DOIUrl":"https://doi.org/10.1145/1869389.1869395","url":null,"abstract":"Virtualization is mainly employed for increasing the utilization of a lightly-loaded system by consolidation, but also to ease the administration based on the possibility to rapidly provision or migrate virtual machines. These facilities are crucial for efficiently managing large data centers. At the same time, modern hardware --- such as Intel's Nehalem microarchitecure --- change critical assumptions about performance bottlenecks and software systems explicitly exploiting the underlying hardware --- such as main memory databases --- gain increasing momentum.\u0000 In this paper, we address the question of how these specialized software systems perform in a virtualized environment. To do so, we present a set of experiments looking at several different variants of in-memory databases: The MonetDB Calibrator, a fine-grained hybrid row/column in-memory database running an OLTP workload, and an in-memory column store database running a multi-user OLAP workload.\u0000 We examine how memory management in virtual machine monitors affects these three classes of applications. For the multi-user OLAP experiment we also experimentally compare a virtualized Nehalem server to one of its predecessors. We show that saturation of the memory bus is a major limiting factor but is much less impactful on the new architecture.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Wimpy node clusters: what about non-wimpy workloads? 弱节点集群:非弱工作负载呢?
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869396
Willis Lang, J. Patel, S. Shankar
The high cost associated with powering servers has introduced new challenges in improving the energy efficiency of clusters running data processing jobs. Traditional high-performance servers are largely energy inefficient due to various factors such as the over-provisioning of resources. The increasing trend to replace traditional high-performance server nodes with low-power low-end nodes in clusters has recently been touted as a solution to the cluster energy problem. However, the key tacit assumption that drives such a solution is that the proportional scale-out of such low-power cluster nodes results in constant scaleup in performance. This paper studies the validity of such an assumption using measured price and performance results from a low-power Atom-based node and a traditional Xeon-based server and a number of published parallel scaleup results. Our results show that in most cases, computationally complex queries exhibit disproportionate scaleup characteristics which potentially makes scale-out with low-end nodes an expensive and lower performance solution.
与服务器供电相关的高成本在提高运行数据处理作业的集群的能源效率方面带来了新的挑战。传统的高性能服务器由于各种因素(如资源的过度供应)在很大程度上是能源效率低下的。最近,用低功耗的低端节点取代传统的高性能服务器节点成为解决集群能源问题的一种方法。然而,驱动这种解决方案的关键隐含假设是,这种低功耗集群节点的比例向外扩展会导致性能的持续扩展。本文使用基于低功耗atom节点和传统xeon服务器的测量价格和性能结果以及许多已发布的并行扩展结果来研究这种假设的有效性。我们的结果表明,在大多数情况下,计算复杂的查询表现出不成比例的扩展特征,这可能使低端节点的横向扩展成为一种昂贵且性能较低的解决方案。
{"title":"Wimpy node clusters: what about non-wimpy workloads?","authors":"Willis Lang, J. Patel, S. Shankar","doi":"10.1145/1869389.1869396","DOIUrl":"https://doi.org/10.1145/1869389.1869396","url":null,"abstract":"The high cost associated with powering servers has introduced new challenges in improving the energy efficiency of clusters running data processing jobs. Traditional high-performance servers are largely energy inefficient due to various factors such as the over-provisioning of resources. The increasing trend to replace traditional high-performance server nodes with low-power low-end nodes in clusters has recently been touted as a solution to the cluster energy problem. However, the key tacit assumption that drives such a solution is that the proportional scale-out of such low-power cluster nodes results in constant scaleup in performance. This paper studies the validity of such an assumption using measured price and performance results from a low-power Atom-based node and a traditional Xeon-based server and a number of published parallel scaleup results. Our results show that in most cases, computationally complex queries exhibit disproportionate scaleup characteristics which potentially makes scale-out with low-end nodes an expensive and lower performance solution.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Flashing databases: expectations and limitations 闪烁的数据库:期望和限制
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869391
S. Baumann, Giel de Nijs, M. Strobel, K. Sattler
Flash devices (solid state disks) promise a significant performance improvement for disk-based database processing. However, database storage structures and processing strategies originally designed for magnetic disks prevent the optimal utilization of SSDs. Based on previous work on bench-marking SSDs and a detailed discussion of I/O methods, in this paper, we analyze appropriate execution methods for database processing as well as important parameters and boundaries and present a tool which helps to derive these parameters.
闪存设备(固态磁盘)保证了基于磁盘的数据库处理的显著性能改进。然而,最初为磁盘设计的数据库存储结构和处理策略阻碍了ssd的最佳利用。基于之前对ssd进行基准测试的工作和对I/O方法的详细讨论,本文分析了数据库处理的适当执行方法以及重要的参数和边界,并提出了一个有助于导出这些参数的工具。
{"title":"Flashing databases: expectations and limitations","authors":"S. Baumann, Giel de Nijs, M. Strobel, K. Sattler","doi":"10.1145/1869389.1869391","DOIUrl":"https://doi.org/10.1145/1869389.1869391","url":null,"abstract":"Flash devices (solid state disks) promise a significant performance improvement for disk-based database processing. However, database storage structures and processing strategies originally designed for magnetic disks prevent the optimal utilization of SSDs. Based on previous work on bench-marking SSDs and a detailed discussion of I/O methods, in this paper, we analyze appropriate execution methods for database processing as well as important parameters and boundaries and present a tool which helps to derive these parameters.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129308866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Supporting extended precision on graphics processors 支持图形处理器的扩展精度
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869392
Mian Lu, Bingsheng He, Qiong Luo
Scientific computing applications often require support for non-traditional data types, for example, numbers with a precision higher than 64-bit floats. As graphics processors, or GPUs, have emerged as a powerful accelerator for scientific computing, we design and implement a GPU-based extended precision library to enable applications with high precision requirement to run on the GPU. Our library contains arithmetic operators, mathematical functions, and data-parallel primitives, each of which can operate at either multi-term or multi-digit precision. The multi-term precision maintains an accuracy of up to 212 bits of signifcand whereas the multi-digit precision allows an accuracy of an arbitrary number of bits. Additionally, we have integrated the extended precision algorithms to a GPU-based query processing engine to support efficient query processing with extended precision on GPUs. To demonstrate the usage of our library, we have implemented three applications: parallel summation in climate modeling, Newton's method used in nonlinear physics, and high precision numerical integration in experimental mathematics. The GPU-based implementation is up to an order of magnitude faster, and achieves the same accuracy as their optimized, quadcore CPU-based counterparts.
科学计算应用程序通常需要支持非传统数据类型,例如,精度高于64位浮点数的数字。随着图形处理器(GPU)成为科学计算的强大加速器,我们设计并实现了一个基于GPU的扩展精度库,使具有高精度要求的应用程序能够在GPU上运行。我们的库包含算术运算符、数学函数和数据并行原语,每一个都可以以多项或多位数精度进行操作。多项精度保持高达212位有效位的精度,而多位数精度允许任意位数的精度。此外,我们将扩展精度算法集成到基于gpu的查询处理引擎中,以支持gpu上具有扩展精度的高效查询处理。为了演示我们的库的使用,我们实现了三个应用:气候建模中的并行求和,非线性物理中使用的牛顿方法,以及实验数学中的高精度数值积分。基于gpu的实现速度快了一个数量级,并且达到了与基于优化的四核cpu对应的相同的精度。
{"title":"Supporting extended precision on graphics processors","authors":"Mian Lu, Bingsheng He, Qiong Luo","doi":"10.1145/1869389.1869392","DOIUrl":"https://doi.org/10.1145/1869389.1869392","url":null,"abstract":"Scientific computing applications often require support for non-traditional data types, for example, numbers with a precision higher than 64-bit floats. As graphics processors, or GPUs, have emerged as a powerful accelerator for scientific computing, we design and implement a GPU-based extended precision library to enable applications with high precision requirement to run on the GPU. Our library contains arithmetic operators, mathematical functions, and data-parallel primitives, each of which can operate at either multi-term or multi-digit precision. The multi-term precision maintains an accuracy of up to 212 bits of signifcand whereas the multi-digit precision allows an accuracy of an arbitrary number of bits. Additionally, we have integrated the extended precision algorithms to a GPU-based query processing engine to support efficient query processing with extended precision on GPUs. To demonstrate the usage of our library, we have implemented three applications: parallel summation in climate modeling, Newton's method used in nonlinear physics, and high precision numerical integration in experimental mathematics. The GPU-based implementation is up to an order of magnitude faster, and achieves the same accuracy as their optimized, quadcore CPU-based counterparts.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124773884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Fast integer compression using SIMD instructions 使用SIMD指令的快速整数压缩
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869394
B. Schlegel, Rainer Gemulla, Wolfgang Lehner
We study algorithms for efficient compression and decompression of a sequence of integers on modern hardware. Our focus is on universal codes in which the codeword length is a monotonically non-decreasing function of the uncompressed integer value; such codes are widely used for compressing "small integers". In contrast to traditional integer compression, our algorithms make use of the SIMD capabilities of modern processors by encoding multiple integer values at once. More specifically, we provide SIMD versions of both null suppression and Elias gamma encoding. Our experiments show that these versions provide a speedup from 1.5x up to 6.7x for decompression, while maintaining a similar compression performance.
我们研究了在现代硬件上对整数序列进行有效压缩和解压缩的算法。我们的重点是通用码,其中码字长度是未压缩整数值的单调非递减函数;这种代码被广泛用于压缩“小整数”。与传统的整数压缩相比,我们的算法通过一次编码多个整数值来利用现代处理器的SIMD功能。更具体地说,我们提供了空抑制和Elias伽玛编码的SIMD版本。我们的实验表明,这些版本的解压速度从1.5倍提高到6.7倍,同时保持了类似的压缩性能。
{"title":"Fast integer compression using SIMD instructions","authors":"B. Schlegel, Rainer Gemulla, Wolfgang Lehner","doi":"10.1145/1869389.1869394","DOIUrl":"https://doi.org/10.1145/1869389.1869394","url":null,"abstract":"We study algorithms for efficient compression and decompression of a sequence of integers on modern hardware. Our focus is on universal codes in which the codeword length is a monotonically non-decreasing function of the uncompressed integer value; such codes are widely used for compressing \"small integers\". In contrast to traditional integer compression, our algorithms make use of the SIMD capabilities of modern processors by encoding multiple integer values at once. More specifically, we provide SIMD versions of both null suppression and Elias gamma encoding. Our experiments show that these versions provide a speedup from 1.5x up to 6.7x for decompression, while maintaining a similar compression performance.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120990339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
On the impact of flash SSDs on spatial indexing 关于闪存ssd对空间索引的影响
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869390
Tobias Emrich, Franz Graf, H. Kriegel, Matthias Schubert, Marisa Thoma
Similarity queries are an important query type in multimedia databases. To implement these types of queries, database systems often use spatial index structures like the R*-Tree. However, the majority of performance evaluations for spatial index structures rely on a conventional background storage layer based on conventional hard drives. Since newer devices like solid-state-disks (SSD) have a completely different performance characteristic, it is an interesting question how far existing index structures profit from these modern storage devices. In this paper, we therefore examine the performance behaviour of the R*-Tree on an SSD compared to a conventional hard drive. Testing various influencing factors like system load, dimensionality and page size of the index our evaluation leads to interesting insights into the performance of spatial index structures on modern background storage layers.
相似查询是多媒体数据库中一种重要的查询类型。为了实现这些类型的查询,数据库系统通常使用像R*-Tree这样的空间索引结构。然而,大多数空间索引结构的性能评估依赖于基于传统硬盘驱动器的传统后台存储层。由于固态磁盘(SSD)等较新的设备具有完全不同的性能特征,因此现有索引结构从这些现代存储设备中获益的程度是一个有趣的问题。因此,在本文中,我们研究了与传统硬盘驱动器相比,SSD上R*-Tree的性能行为。通过测试各种影响因素,如系统负载、索引的维度和页面大小,我们的评估对现代后台存储层上空间索引结构的性能产生了有趣的见解。
{"title":"On the impact of flash SSDs on spatial indexing","authors":"Tobias Emrich, Franz Graf, H. Kriegel, Matthias Schubert, Marisa Thoma","doi":"10.1145/1869389.1869390","DOIUrl":"https://doi.org/10.1145/1869389.1869390","url":null,"abstract":"Similarity queries are an important query type in multimedia databases. To implement these types of queries, database systems often use spatial index structures like the R*-Tree. However, the majority of performance evaluations for spatial index structures rely on a conventional background storage layer based on conventional hard drives. Since newer devices like solid-state-disks (SSD) have a completely different performance characteristic, it is an interesting question how far existing index structures profit from these modern storage devices. In this paper, we therefore examine the performance behaviour of the R*-Tree on an SSD compared to a conventional hard drive. Testing various influencing factors like system load, dimensionality and page size of the index our evaluation leads to interesting insights into the performance of spatial index structures on modern background storage layers.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131512835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Optimizing read convoys in main-memory query processing 优化主存查询处理中的读队列
Pub Date : 2010-06-07 DOI: 10.1145/1869389.1869393
K. A. Ross
Concurrent read-only scans of memory-resident fact tables can form convoys, which generally help performance because cache misses are amortized over several members of the convoy. Nevertheless, we identify two performance hazards for such convoys. One hazard is underutilization of the memory bandwidth because all members of the convoy hit the same cache lines at the same time, rather than reading several different lines concurrently. The other hazard is a form of interference that occurs on the Sun Niagara T1 and T2 machines under certain workloads. We propose solutions to these hazards, including a local shuffle method that reduces interference, preserves the beneficial aspects of convoy behavior, and increases the effective bandwidth by allowing different members of a convoy to concurrently access different cache lines. We provide experimental validation of the methods on several modern architectures.
驻留内存的事实表的并发只读扫描可以形成车队,这通常有助于提高性能,因为缓存丢失会分摊到车队的几个成员上。然而,我们确定了这类车队的两个性能隐患。一个危险是内存带宽利用率不足,因为车队的所有成员同时访问相同的缓存线,而不是同时读取几条不同的线。另一个危险是太阳尼亚加拉T1和T2机器在某些工作负荷下发生的一种形式的干扰。我们提出了解决这些危险的方案,包括减少干扰的局部洗牌方法,保留车队行为的有益方面,并通过允许车队的不同成员并发访问不同的缓存线路来增加有效带宽。我们在几个现代体系结构上对这些方法进行了实验验证。
{"title":"Optimizing read convoys in main-memory query processing","authors":"K. A. Ross","doi":"10.1145/1869389.1869393","DOIUrl":"https://doi.org/10.1145/1869389.1869393","url":null,"abstract":"Concurrent read-only scans of memory-resident fact tables can form convoys, which generally help performance because cache misses are amortized over several members of the convoy. Nevertheless, we identify two performance hazards for such convoys. One hazard is underutilization of the memory bandwidth because all members of the convoy hit the same cache lines at the same time, rather than reading several different lines concurrently. The other hazard is a form of interference that occurs on the Sun Niagara T1 and T2 machines under certain workloads. We propose solutions to these hazards, including a local shuffle method that reduces interference, preserves the beneficial aspects of convoy behavior, and increases the effective bandwidth by allowing different members of a convoy to concurrently access different cache lines. We provide experimental validation of the methods on several modern architectures.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133930587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1