首页 > 最新文献

International Workshop on Data Management on New Hardware最新文献

英文 中文
Architectural characterization of XQuery workloads on modern processors 现代处理器上XQuery工作负载的体系结构特征
Pub Date : 2007-06-15 DOI: 10.1145/1363189.1363199
Rubao Lee, Bihui Duan, Taoying Liu
As XQuery rapidly emerges as the standard for querying XML documents, it is very important to understand the architectural characteristics and behaviors of such workloads. A lot of efforts are focused on the implementation, optimization, and evaluation of XQuery tools. However, few or no prior work studies the architectural and memory system behaviors of XQuery workloads on modern hardware platforms. This makes it unclear whether modern CPU techniques, such as the multi-level caches and hardware branch predictors, can support such workloads well enough. This paper presents a detailed characterization of the architectural behavior of XQuery workloads. We examine four XQuery tools on three hardware platforms (AMD, Intel, and Sun) using well-designed XQuery queries. We report measured architectural data, including the L1/L2 cache misses, TLB misses, and branch mispredictions. We believe that the information will be useful in understanding XQuery workloads and analyzing the potential architectural optimization opportunities of improving XQuery performance.
随着XQuery迅速成为查询XML文档的标准,理解这种工作负载的体系结构特征和行为非常重要。很多工作都集中在XQuery工具的实现、优化和评估上。然而,很少或没有先前的工作研究现代硬件平台上XQuery工作负载的体系结构和内存系统行为。这使得现代CPU技术(如多级缓存和硬件分支预测器)是否能够很好地支持这种工作负载变得不清楚。本文详细描述了XQuery工作负载的体系结构行为。我们使用设计良好的XQuery查询,在三种硬件平台(AMD、Intel和Sun)上研究四种XQuery工具。我们报告测量的体系结构数据,包括L1/L2缓存缺失、TLB缺失和分支错误预测。我们相信这些信息对于理解XQuery工作负载和分析改进XQuery性能的潜在架构优化机会非常有用。
{"title":"Architectural characterization of XQuery workloads on modern processors","authors":"Rubao Lee, Bihui Duan, Taoying Liu","doi":"10.1145/1363189.1363199","DOIUrl":"https://doi.org/10.1145/1363189.1363199","url":null,"abstract":"As XQuery rapidly emerges as the standard for querying XML documents, it is very important to understand the architectural characteristics and behaviors of such workloads. A lot of efforts are focused on the implementation, optimization, and evaluation of XQuery tools. However, few or no prior work studies the architectural and memory system behaviors of XQuery workloads on modern hardware platforms. This makes it unclear whether modern CPU techniques, such as the multi-level caches and hardware branch predictors, can support such workloads well enough.\u0000 This paper presents a detailed characterization of the architectural behavior of XQuery workloads. We examine four XQuery tools on three hardware platforms (AMD, Intel, and Sun) using well-designed XQuery queries. We report measured architectural data, including the L1/L2 cache misses, TLB misses, and branch mispredictions. We believe that the information will be useful in understanding XQuery workloads and analyzing the potential architectural optimization opportunities of improving XQuery performance.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132693149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pipelined hash-join on multithreaded architectures 多线程架构上的流水线散列连接
Pub Date : 2007-06-15 DOI: 10.1145/1363189.1363191
Philip C. Garcia, H. F. Korth
Multi-core and multithreaded processors present both opportunities and challenges in the design of database query processing algorithms. Previous work has shown the potential for performance gains, but also that, in adverse circumstances, multithreading can actually reduce performance. This paper examines the performance of a pipeline of hash-join operations when executing on multithreaded and multicore processors. We examine the optimal number of threads to execute and the partitioning of the workload across those threads. We then describe a buffer-management scheme that minimizes cache conflicts among the threads. Additionally we compare the performance of full materialization of the output at each stage in the pipeline versus passing pointers between stages.
多核和多线程处理器为数据库查询处理算法的设计带来了机遇和挑战。以前的工作已经表明了提高性能的潜力,但同时也表明,在不利的情况下,多线程实际上会降低性能。本文研究了在多线程和多核处理器上执行哈希连接操作的管道的性能。我们将检查要执行的最佳线程数以及跨这些线程的工作负载分区。然后,我们描述了一个缓冲区管理方案,该方案可以最大限度地减少线程之间的缓存冲突。此外,我们比较了管道中每个阶段输出的完全物质化与在阶段之间传递指针的性能。
{"title":"Pipelined hash-join on multithreaded architectures","authors":"Philip C. Garcia, H. F. Korth","doi":"10.1145/1363189.1363191","DOIUrl":"https://doi.org/10.1145/1363189.1363191","url":null,"abstract":"Multi-core and multithreaded processors present both opportunities and challenges in the design of database query processing algorithms. Previous work has shown the potential for performance gains, but also that, in adverse circumstances, multithreading can actually reduce performance. This paper examines the performance of a pipeline of hash-join operations when executing on multithreaded and multicore processors. We examine the optimal number of threads to execute and the partitioning of the workload across those threads. We then describe a buffer-management scheme that minimizes cache conflicts among the threads. Additionally we compare the performance of full materialization of the output at each stage in the pipeline versus passing pointers between stages.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"53 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116534342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Large scale Itanium® 2 processor OLTP workload characterization and optimization 大型Itanium®2处理器OLTP工作负载表征和优化
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140406
Gerrit Saylor, Badriddine M. Khessib
Large scale OLTP workloads on modern database servers are well understood across the industry. Their runtime performance characterizations serve to drive both server side software features and processor specific design decisions but are not understood outside of the primary industry stakeholders. We provide a rare glimpse into the performance characterizations of processor and platform targeted software optimizations running on a large-scale 32 processor, Intel® Itanium® 2 based, ccNUMA platform.
现代数据库服务器上的大规模OLTP工作负载在整个行业中都得到了很好的理解。它们的运行时性能特征用于驱动服务器端软件特性和处理器特定的设计决策,但在主要行业利益相关者之外无法理解。我们提供了一个罕见的一瞥处理器和平台目标软件优化运行在一个大型32处理器,Intel®Itanium®2,ccNUMA平台的性能特征。
{"title":"Large scale Itanium® 2 processor OLTP workload characterization and optimization","authors":"Gerrit Saylor, Badriddine M. Khessib","doi":"10.1145/1140402.1140406","DOIUrl":"https://doi.org/10.1145/1140402.1140406","url":null,"abstract":"Large scale OLTP workloads on modern database servers are well understood across the industry. Their runtime performance characterizations serve to drive both server side software features and processor specific design decisions but are not understood outside of the primary industry stakeholders. We provide a rare glimpse into the performance characterizations of processor and platform targeted software optimizations running on a large-scale 32 processor, Intel® Itanium® 2 based, ccNUMA platform.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126759018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Processing-in-memory technology for knowledge discovery algorithms 面向知识发现算法的内存处理技术
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140405
Jafar Adibi, T. Barrett, Spundun Bhatt, Hans Chalupsky, Jacqueline Chame, Mary W. Hall
The goal of this work is to gain insight into whether processing-in-memory (PIM) technology can be used to accelerate the performance of link discovery algorithms, which represent an important class of emerging knowledge discovery techniques. PIM chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memory-bandwidth requirements. As LD algorithms are data-intensive and highly parallel, involving read-only queries over large data sets, parallel computing power extremely close (physically) to the data has the potential of providing dramatic computing speedups. For this reason, we evaluated the mapping of LD algorithms to a processing-in-memory (PIM) workstation-class architecture, the DIVA/Godiva hardware testbeds developed by USC/ISI. Accounting for differences in clock speed and data scaling, our analysis shows a performance gain on a single PIM, with the potential for greater improvement when multiple PIMs are used. Measured speedups of 8x are shown on two additional bandwidth benchmarks, even though the Itanium-2 has a clock rate 6X faster.
这项工作的目标是深入了解是否可以使用内存中处理(PIM)技术来加速链接发现算法的性能,链接发现算法代表了一类重要的新兴知识发现技术。PIM芯片将处理器逻辑集成到存储设备中,为弥合处理器和内存速度之间日益增长的差距提供了新的机会,特别是对于具有高内存带宽要求的应用程序。由于LD算法是数据密集型且高度并行的,涉及对大型数据集的只读查询,因此与数据非常接近(物理上)的并行计算能力具有提供显著计算速度的潜力。出于这个原因,我们评估了LD算法到内存中处理(PIM)工作站级架构的映射,即USC/ISI开发的DIVA/Godiva硬件测试平台。考虑到时钟速度和数据扩展的差异,我们的分析显示单个PIM的性能有所提高,使用多个PIM时可能会有更大的改进。在两个额外的带宽基准测试中显示了8倍的测量速度,尽管Itanium-2的时钟速率快了6倍。
{"title":"Processing-in-memory technology for knowledge discovery algorithms","authors":"Jafar Adibi, T. Barrett, Spundun Bhatt, Hans Chalupsky, Jacqueline Chame, Mary W. Hall","doi":"10.1145/1140402.1140405","DOIUrl":"https://doi.org/10.1145/1140402.1140405","url":null,"abstract":"The goal of this work is to gain insight into whether processing-in-memory (PIM) technology can be used to accelerate the performance of link discovery algorithms, which represent an important class of emerging knowledge discovery techniques. PIM chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memory-bandwidth requirements. As LD algorithms are data-intensive and highly parallel, involving read-only queries over large data sets, parallel computing power extremely close (physically) to the data has the potential of providing dramatic computing speedups. For this reason, we evaluated the mapping of LD algorithms to a processing-in-memory (PIM) workstation-class architecture, the DIVA/Godiva hardware testbeds developed by USC/ISI. Accounting for differences in clock speed and data scaling, our analysis shows a performance gain on a single PIM, with the potential for greater improvement when multiple PIMs are used. Measured speedups of 8x are shown on two additional bandwidth benchmarks, even though the Itanium-2 has a clock rate 6X faster.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114501705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Realizing parallelism in database operations: insights from a massively multithreaded architecture 在数据库操作中实现并行:来自大规模多线程架构的见解
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140408
J. Cieslewicz, Jonathan W. Berry, B. Hendrickson, K. A. Ross
A new trend in processor design is increased on-chip support for multithreading in the form of both chip multiprocessors and simultaneous multithreading. Recent research in database systems has begun to explore increased thread-level parallelism made possible by these new multicore and multithreaded processors. The question of how best to use this new technology remains open, particularly as the number of cores per chip and threads per core increase. In this paper we use an existing massively multithreaded architecture, the Cray MTA-2, to explore the implications that a larger degree of on-chip multithreading may have for parallelism in database operations. We find that parallelism in database operations is easy to achieve on the MTA-2 and that, with little effort, parallelism can be made to scale linearly with the number of available processor cores.
处理器设计的一个新趋势是以芯片多处理器和同步多线程的形式增加对多线程的片上支持。最近对数据库系统的研究已经开始探索这些新的多核和多线程处理器所带来的线程级并行性。如何最好地使用这项新技术的问题仍然是开放的,特别是随着每个芯片的核心数量和每个核心的线程数量的增加。在本文中,我们使用现有的大规模多线程体系结构Cray MTA-2来探索更大程度的片上多线程可能对数据库操作的并行性产生的影响。我们发现,数据库操作的并行性很容易在MTA-2上实现,而且,只需很少的努力,并行性就可以随着可用处理器内核的数量线性扩展。
{"title":"Realizing parallelism in database operations: insights from a massively multithreaded architecture","authors":"J. Cieslewicz, Jonathan W. Berry, B. Hendrickson, K. A. Ross","doi":"10.1145/1140402.1140408","DOIUrl":"https://doi.org/10.1145/1140402.1140408","url":null,"abstract":"A new trend in processor design is increased on-chip support for multithreading in the form of both chip multiprocessors and simultaneous multithreading. Recent research in database systems has begun to explore increased thread-level parallelism made possible by these new multicore and multithreaded processors. The question of how best to use this new technology remains open, particularly as the number of cores per chip and threads per core increase. In this paper we use an existing massively multithreaded architecture, the Cray MTA-2, to explore the implications that a larger degree of on-chip multithreading may have for parallelism in database operations. We find that parallelism in database operations is easy to achieve on the MTA-2 and that, with little effort, parallelism can be made to scale linearly with the number of available processor cores.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122357215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
B-tree indexes, interpolation search, and skew b树索引,插值搜索和倾斜
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140409
G. Graefe
Recent performance improvements in storage hardware have benefited bandwidth much more than latency. Among other implications, this trend favors large B-tree pages. Recent performance improvements in processor hardware also have benefited processing bandwidth much more than memory latency. Among other implications, this trend favors adding calculations if they save cache faults.With small calculations guiding the search directly to the desired key, interpolation search complements these trends much better than binary search. It performs well if the distribution of key values is perfectly uniform, but it can be useless and even wasteful otherwise. This paper collects and describes more than a dozen techniques for interpolation search in B-tree indexes. Most of them attempt to avoid skew or to detect skew very early and then to avoid its bad effects. Some of these methods are part of the folklore of B-tree search, whereas other techniques are new. The purpose of this survey is to encourage research into such techniques and their performance on modern hardware.
最近存储硬件的性能改进对带宽的好处远远大于对延迟的好处。除其他含义外,这种趋势有利于大型b树页面。处理器硬件最近的性能改进对处理带宽的好处也远远超过对内存延迟的好处。除其他含义外,这种趋势有利于添加计算,如果它们可以保存缓存错误。通过少量的计算将搜索直接引导到所需的关键字,插值搜索比二分搜索更好地补充了这些趋势。如果键值的分布是完全一致的,那么它执行得很好,但如果不是这样,它可能是无用的,甚至是浪费的。本文收集并描述了在b树索引中进行插值搜索的十几种技术。它们中的大多数都试图避免偏差,或者很早就发现偏差,然后避免其不良影响。其中一些方法是b树搜索的一部分,而其他技术是新的。这项调查的目的是鼓励对这些技术及其在现代硬件上的性能进行研究。
{"title":"B-tree indexes, interpolation search, and skew","authors":"G. Graefe","doi":"10.1145/1140402.1140409","DOIUrl":"https://doi.org/10.1145/1140402.1140409","url":null,"abstract":"Recent performance improvements in storage hardware have benefited bandwidth much more than latency. Among other implications, this trend favors large B-tree pages. Recent performance improvements in processor hardware also have benefited processing bandwidth much more than memory latency. Among other implications, this trend favors adding calculations if they save cache faults.With small calculations guiding the search directly to the desired key, interpolation search complements these trends much better than binary search. It performs well if the distribution of key values is perfectly uniform, but it can be useless and even wasteful otherwise. This paper collects and describes more than a dozen techniques for interpolation search in B-tree indexes. Most of them attempt to avoid skew or to detect skew very early and then to avoid its bad effects. Some of these methods are part of the folklore of B-tree search, whereas other techniques are new. The purpose of this survey is to encourage research into such techniques and their performance on modern hardware.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"959 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133987327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Using secure coprocessors for privacy preserving collaborative data mining and analysis 使用安全协处理器进行保护隐私的协作数据挖掘和分析
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140404
Bishwaranjan Bhattacharjee, N. Abe, Kenneth A. Goldman, B. Zadrozny, Vamsavardhana R. Chillakuru, Marysabel del Carpio, C. Apté
Secure coprocessors have traditionally been used as a keystone of a security subsystem, eliminating the need to protect the rest of the subsystem with physical security measures. With technological advances and hardware miniaturization they have become increasingly powerful. This opens up the possibility of using them for non traditional use. This paper describes a solution for privacy preserving data sharing and mining using cryptographically secure but resource limited coprocessors. It uses memory light data mining methodologies along with a light weight database engine with federation capability, running on a coprocessor. The data to be shared resides with the enterprises that want to collaborate. This system will allow multiple enterprises, which are generally not allowed to share data, to do so solely for the purpose of detecting particular types of anomalies and for generating alerts. We also present results from experiments which demonstrate the value of such collaborations.
安全协处理器传统上被用作安全子系统的基石,从而消除了使用物理安全措施保护子系统其余部分的需要。随着技术的进步和硬件的小型化,它们变得越来越强大。这开启了将它们用于非传统用途的可能性。本文描述了一种使用加密安全但资源有限的协处理器来保护隐私的数据共享和挖掘的解决方案。它使用内存轻量级数据挖掘方法,以及在协处理器上运行的具有联邦功能的轻量级数据库引擎。要共享的数据驻留在希望协作的企业中。该系统将允许通常不允许共享数据的多个企业仅为了检测特定类型的异常和生成警报而这样做。我们还介绍了证明这种合作价值的实验结果。
{"title":"Using secure coprocessors for privacy preserving collaborative data mining and analysis","authors":"Bishwaranjan Bhattacharjee, N. Abe, Kenneth A. Goldman, B. Zadrozny, Vamsavardhana R. Chillakuru, Marysabel del Carpio, C. Apté","doi":"10.1145/1140402.1140404","DOIUrl":"https://doi.org/10.1145/1140402.1140404","url":null,"abstract":"Secure coprocessors have traditionally been used as a keystone of a security subsystem, eliminating the need to protect the rest of the subsystem with physical security measures. With technological advances and hardware miniaturization they have become increasingly powerful. This opens up the possibility of using them for non traditional use. This paper describes a solution for privacy preserving data sharing and mining using cryptographically secure but resource limited coprocessors. It uses memory light data mining methodologies along with a light weight database engine with federation capability, running on a coprocessor. The data to be shared resides with the enterprises that want to collaborate. This system will allow multiple enterprises, which are generally not allowed to share data, to do so solely for the purpose of detecting particular types of anomalies and for generating alerts. We also present results from experiments which demonstrate the value of such collaborations.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128836417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Architecture-conscious hashing Architecture-conscious哈希
Pub Date : 2006-06-25 DOI: 10.1145/1140402.1140410
M. Zukowski, S. Héman, P. Boncz
Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a separate preparatory phase, forcing materialization, which may require I/O if the stream does not fit in RAM. We introduce best-effort partitioning, a technique that interleaves partitioning with execution of hash-based query processing operators and avoids I/O. In the process, we show how to prevent issues in partitioning with cacheline alignment, that can strongly decrease throughput. We also demonstrate overall query processing performance when both CPU-efficient hashing and best-effort partitioning are combined.
散列是用于实现查询处理操作符(如分组、聚合和连接)的基本技术之一。本文研究了现代计算机体系结构与基于哈希的查询处理技术之间的相互作用。首先,我们专注于从超标量cpu中提取最大的哈希性能。特别是,我们讨论了快速哈希函数,有效处理多列键的方法,并建议使用最近引入的一种称为布谷鸟哈希的哈希方案,而不是常用的桶链哈希。在本文的第二部分,我们关注CPU缓存的使用,通过动态划分数据流,使部分哈希表适合CPU缓存。传统的分区作为一个单独的准备阶段工作,强制物化,如果流不适合RAM,这可能需要I/O。我们介绍了尽力而为的分区,这种技术将分区与基于哈希的查询处理操作符的执行交织在一起,从而避免了I/O。在此过程中,我们将展示如何防止使用缓存对齐进行分区时出现的问题,这些问题可能会严重降低吞吐量。我们还演示了结合使用cpu高效散列和尽力而为分区时的总体查询处理性能。
{"title":"Architecture-conscious hashing","authors":"M. Zukowski, S. Héman, P. Boncz","doi":"10.1145/1140402.1140410","DOIUrl":"https://doi.org/10.1145/1140402.1140410","url":null,"abstract":"Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a separate preparatory phase, forcing materialization, which may require I/O if the stream does not fit in RAM. We introduce best-effort partitioning, a technique that interleaves partitioning with execution of hash-based query processing operators and avoids I/O. In the process, we show how to prevent issues in partitioning with cacheline alignment, that can strongly decrease throughput. We also demonstrate overall query processing performance when both CPU-efficient hashing and best-effort partitioning are combined.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127646870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013, New York, NY, USA, June 24, 2013 第九届新硬件数据管理国际研讨会论文集,2013年6月24日,美国纽约
Pub Date : 1900-01-01 DOI: 10.1145/2485278
{"title":"Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN 2013, New York, NY, USA, June 24, 2013","authors":"","doi":"10.1145/2485278","DOIUrl":"https://doi.org/10.1145/2485278","url":null,"abstract":"","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133053763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1