首页 > 最新文献

Proceedings of the 16th International Workshop on Data Management on New Hardware最新文献

英文 中文
Variable word length word-aligned hybrid compression 可变字长字对齐混合压缩
Florian Grieskamp, Roland Kühn, J. Teubner
The Word-Aligned Hybrid (WAH) compression is a prominent example of a lightweight compression scheme for bitmap indices that considers the word size of the underlying architecture. This is a compromise toward commodity CPUs, where operations below the word granularity perform poorly. With the emergence of novel hardware classes, such compromises may no longer be appropriate. Field-programmable gate arrays (FPGAs) do not even have any meaningful "word size". In this work, we reconsider strategies for bitmap compression in the light of modern hardware architectures. Rather than tuning compression toward a fixed word size, we propose to tune the word size toward optimal compression. The resulting compression scheme, Variable Word Length Word-Aligned Hybrid (VWLWAH), improves compression rates by almost 75% while maintaining line rate performance on FPGAs.
字对齐混合(WAH)压缩是位图索引轻量级压缩方案的一个突出例子,它考虑了底层架构的字长。这是对商品cpu的妥协,在商品cpu中,低于单词粒度的操作性能很差。随着新型硬件类的出现,这样的妥协可能不再合适。现场可编程门阵列(fpga)甚至没有任何有意义的“字长”。在这项工作中,我们根据现代硬件架构重新考虑位图压缩策略。与其将压缩调优到固定的字长,我们建议将字长调优到最优压缩。由此产生的压缩方案可变字长字对齐混合(VWLWAH)将压缩率提高了近75%,同时在fpga上保持线速率性能。
{"title":"Variable word length word-aligned hybrid compression","authors":"Florian Grieskamp, Roland Kühn, J. Teubner","doi":"10.1145/3399666.3399935","DOIUrl":"https://doi.org/10.1145/3399666.3399935","url":null,"abstract":"The Word-Aligned Hybrid (WAH) compression is a prominent example of a lightweight compression scheme for bitmap indices that considers the word size of the underlying architecture. This is a compromise toward commodity CPUs, where operations below the word granularity perform poorly. With the emergence of novel hardware classes, such compromises may no longer be appropriate. Field-programmable gate arrays (FPGAs) do not even have any meaningful \"word size\". In this work, we reconsider strategies for bitmap compression in the light of modern hardware architectures. Rather than tuning compression toward a fixed word size, we propose to tune the word size toward optimal compression. The resulting compression scheme, Variable Word Length Word-Aligned Hybrid (VWLWAH), improves compression rates by almost 75% while maintaining line rate performance on FPGAs.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129427329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical evaluation across multiple GPU-accelerated DBMSes 跨多个gpu加速dbms的经验评估
Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh
In this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases nor have full capability to process some complex analytical queries. Thus, we claim that the GPU DBMSes still need to be further engineered to achieve a better analytical performance.
在本文中,我们对具有TPC-H工作负载的现代gpu加速dbms进行了实证研究。我们严格的实验表明,所研究的dbms似乎可以有效地利用GPU资源,但不能很好地扩展数据库,也不能完全处理一些复杂的分析查询。因此,我们声称GPU dbms仍然需要进一步设计以实现更好的分析性能。
{"title":"Empirical evaluation across multiple GPU-accelerated DBMSes","authors":"Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh","doi":"10.1145/3399666.3399907","DOIUrl":"https://doi.org/10.1145/3399666.3399907","url":null,"abstract":"In this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases nor have full capability to process some complex analytical queries. Thus, we claim that the GPU DBMSes still need to be further engineered to achieve a better analytical performance.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FPGA-Accelerated compression of integer vectors fpga加速整型向量的压缩
Mahmoud Mohsen, Norman May, Christian Färber, David Broneske
An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.
在字典编码的列存储(如SAP HANA)中,整数向量的有效压缩对于在有限且宝贵的主内存中保存更多数据至关重要。过去的研究集中在轻量级压缩技术上,这些技术以低延迟的数据访问换取较低的压缩比。因此,宽表中只有少数列受益于轻量级和有效的压缩方案,如游程编码、前缀压缩或稀疏编码。除了位填充之外,其他列仍然未压缩,这显然错过了许多列获得更好压缩比的机会。此外,压缩的主要执行器是CPU,因为压缩涉及大量数据传输。特别是在与协处理器一起使用时,数据传输开销会抵消使用协处理器带来的性能收益。在本文中,我们研究了是否可以通过使用二进制封装和前缀抑制卸载到FPGA来获得良好的压缩比,甚至对于先前未压缩的列。作为流处理器,FPGA是外包压缩任务的最佳选择。由于我们基于opencl的实现,我们通过使用不到三分之一的FPGA资源,在FPGA压缩期间实现了可用PCIe总线的饱和。此外,我们针对基于cpu的SAP HANA的实际实验表明,在压缩吞吐量方面的性能提高了大约2倍,同时将数据压缩到最佳SAP HANA压缩技术的60%。
{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":"https://doi.org/10.1145/3399666.3399932","url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Let's add transactions to FPGA-based key-value stores! 让我们将事务添加到基于fpga的键值存储中!
Z. István
In recent years we have seen a proliferation of FPGA-based key value stores (KVSs) [1--3, 5--7, 10] driven by the need for more efficient large-scale data management and storage solutions. In this context, FPGAs are useful because they offer network-bound performance even with small key-value pairs and near-data processing in a fraction of the energy budget of regular servers. Even though the first FPGA-based key-value stores started appearing already in 2013 and have evolved significantly in the meantime, almost no attention has been paid to offering transactions. Today, however, that such systems are becoming increasingly practical, we need to ensure consistency guarantees for concurrent clients (transactions). This position paper makes the case that adding transaction support is not particularly expensive, compared to other parts of these systems, and in the future all FPGA-based KVSs should provide some form of transactional guarantees. In the remaining of this paper we present a high level view of the typical pipelined architecture of FPGA-based KVSs that most existing designs follow, and show three different ways of implementing transactions, with increasing sophistication: from operation batching, through two phase locking (2PL), to a simplified snapshot isolation model.
近年来,由于需要更高效的大规模数据管理和存储解决方案,我们看到了基于fpga的键值存储(KVSs)的激增[1- 3,5 - 7,10]。在这种情况下,fpga是有用的,因为它们提供了网络绑定的性能,即使是小的键值对和近数据处理,在常规服务器的能源预算的一小部分。尽管第一个基于fpga的键值存储在2013年就已经出现,并且在此期间有了很大的发展,但几乎没有人关注提供交易。然而,今天,这样的系统变得越来越实用,我们需要确保并发客户机(事务)的一致性保证。本文认为,与这些系统的其他部分相比,添加事务支持并不是特别昂贵,而且在未来,所有基于fpga的kvs都应该提供某种形式的事务保证。在本文的其余部分中,我们展示了大多数现有设计遵循的基于fpga的kvs的典型流水线架构的高级视图,并展示了三种不同的实现事务的方式,其复杂性越来越高:从操作批处理,到两阶段锁定(2PL),再到简化的快照隔离模型。
{"title":"Let's add transactions to FPGA-based key-value stores!","authors":"Z. István","doi":"10.1145/3399666.3399909","DOIUrl":"https://doi.org/10.1145/3399666.3399909","url":null,"abstract":"In recent years we have seen a proliferation of FPGA-based key value stores (KVSs) [1--3, 5--7, 10] driven by the need for more efficient large-scale data management and storage solutions. In this context, FPGAs are useful because they offer network-bound performance even with small key-value pairs and near-data processing in a fraction of the energy budget of regular servers. Even though the first FPGA-based key-value stores started appearing already in 2013 and have evolved significantly in the meantime, almost no attention has been paid to offering transactions. Today, however, that such systems are becoming increasingly practical, we need to ensure consistency guarantees for concurrent clients (transactions). This position paper makes the case that adding transaction support is not particularly expensive, compared to other parts of these systems, and in the future all FPGA-based KVSs should provide some form of transactional guarantees. In the remaining of this paper we present a high level view of the typical pipelined architecture of FPGA-based KVSs that most existing designs follow, and show three different ways of implementing transactions, with increasing sophistication: from operation batching, through two phase locking (2PL), to a simplified snapshot isolation model.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Analyzing memory accesses with modern processors 用现代处理器分析内存访问
Stefan Noll, J. Teubner, Norman May, Alexander Böhm
Debugging and tuning database systems is very challenging. Using common profiling tools is often not sufficient because they identify the machine instruction rather than the instance of a data structure that causes a performance problem. This leaves a problem's root cause such as memory hotspots or poor data layouts hidden. The state-of-the-art solution is to augment classical profiling with a memory trace. However, current approaches for collecting memory traces are not usable in practice due to their large runtime overhead. In this work, we leverage a mechanism available in modern processors to collect memory traces via hardware-based sampling. We evaluate our approach using a commercial and an open-source database system running the JCC-H benchmark. In particular, we demonstrate that our approach is practical due to its low runtime overhead and we illustrate how memory traces uncover new insights into the memory access characteristics of database systems.
调试和调优数据库系统非常具有挑战性。使用通用的分析工具通常是不够的,因为它们识别的是机器指令,而不是导致性能问题的数据结构的实例。这就隐藏了问题的根本原因,比如内存热点或糟糕的数据布局。最先进的解决方案是使用内存跟踪来增强传统的分析。然而,目前收集内存轨迹的方法在实践中是不可用的,因为它们的运行时开销很大。在这项工作中,我们利用现代处理器中可用的机制,通过基于硬件的采样来收集内存轨迹。我们使用运行JCC-H基准的商业和开源数据库系统来评估我们的方法。特别是,我们证明了我们的方法是实用的,因为它的低运行时开销,我们说明了内存跟踪如何揭示数据库系统的内存访问特征的新见解。
{"title":"Analyzing memory accesses with modern processors","authors":"Stefan Noll, J. Teubner, Norman May, Alexander Böhm","doi":"10.1145/3399666.3399896","DOIUrl":"https://doi.org/10.1145/3399666.3399896","url":null,"abstract":"Debugging and tuning database systems is very challenging. Using common profiling tools is often not sufficient because they identify the machine instruction rather than the instance of a data structure that causes a performance problem. This leaves a problem's root cause such as memory hotspots or poor data layouts hidden. The state-of-the-art solution is to augment classical profiling with a memory trace. However, current approaches for collecting memory traces are not usable in practice due to their large runtime overhead. In this work, we leverage a mechanism available in modern processors to collect memory traces via hardware-based sampling. We evaluate our approach using a commercial and an open-source database system running the JCC-H benchmark. In particular, we demonstrate that our approach is practical due to its low runtime overhead and we illustrate how memory traces uncover new insights into the memory access characteristics of database systems.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123325720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
To share or not to share vector registers? 共享还是不共享矢量寄存器?
Pub Date : 2020-06-14 DOI: 10.1007/s00778-022-00744-2
Johannes Pietrzyk, Dirk Habich, Wolfgang Lehner
{"title":"To share or not to share vector registers?","authors":"Johannes Pietrzyk, Dirk Habich, Wolfgang Lehner","doi":"10.1007/s00778-022-00744-2","DOIUrl":"https://doi.org/10.1007/s00778-022-00744-2","url":null,"abstract":"","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Large-scale in-memory analytics on Intel® Optane™ DC persistent memory 基于Intel®Optane™DC持久内存的大规模内存分析
Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden
New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.
新的数据存储技术,如最近推出的Intel®Optane™DC Persistent Memory Module (PMM),为优化数据库工作负载的查询处理性能提供了令人兴奋的机会。特别是,低延迟、字节可寻址性、持久性和大容量的独特组合使持久性内存(PMem)与DRAM和ssd一起成为有吸引力的替代方案。探索这种新媒介的性能特征是理解它将如何影响数据库系统的设计和性能的第一个关键步骤。在本文中,我们介绍了在分析数据库工作负载背景下表征英特尔®Optane™DC PMM性能行为的首批实验研究之一。首先,我们分析了这类工作负载中常见的基本访问模式,例如顺序读取、选择性读取和随机读取,以及完整的Star Schema基准测试,比较了独立的基于DRAM和基于pmems的实现。然后,我们将分析扩展到更大数据集上的连接算法,这需要以混合方式使用DRAM和PMem,同时特别注意PMem的读写不对称。我们的研究揭示了有趣的性能权衡,可以帮助指导在存储层次结构中存在持久内存的下一代OLAP系统的设计。
{"title":"Large-scale in-memory analytics on Intel® Optane™ DC persistent memory","authors":"Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden","doi":"10.1145/3399666.3399933","DOIUrl":"https://doi.org/10.1145/3399666.3399933","url":null,"abstract":"New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123027125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
The tale of 1000 Cores: an evaluation of concurrency control on real(ly) large multi-socket hardware 1000核的故事:对真正(真正)大型多套接字硬件上的并发控制的评估
Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig
In this paper, we set out the goal to revisit the results of "Starring into the Abyss [...] of Concurrency Control with [1000] Cores" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.
在本文中,我们的目标是重新审视“主演入深渊”的结果。[1000]个内核并发控制][27],并分析当今大型硬件上的内存dbms。尽管作者最初的假设是这样的,但今天我们并没有看到1000核的单插槽cpu。相反,多插座硬件进入了生产数据中心。因此,我们在之前的工作基础上,对实际生产的1568核多插槽硬件上的并发控制方案的特性进行了评估。令我们惊讶的是,我们在这篇论文中报告了几个有趣的发现。
{"title":"The tale of 1000 Cores: an evaluation of concurrency control on real(ly) large multi-socket hardware","authors":"Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig","doi":"10.1145/3399666.3399910","DOIUrl":"https://doi.org/10.1145/3399666.3399910","url":null,"abstract":"In this paper, we set out the goal to revisit the results of \"Starring into the Abyss [...] of Concurrency Control with [1000] Cores\" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"21 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134173796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS 从数据库管理系统中Intel optane DC持久内存的早期性能评估中获得的经验教训
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do
Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing the low access latency and byte-addressablity of traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads with Microsoft SQL Server 2019 when using PMem as buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g. simple hardware or software changes are not enough for the best use of PMem in DBMSs.
非易失性存储器(NVM)是一种新兴的存储技术,它既具有大容量存储设备的持久性,又具有传统DRAM存储器的低访问延迟和字节寻址能力。在本文中,我们对最近发布的NVM设备Intel Optane DC Persistent Memory (PMem)在不同配置下使用几个微基准测试工具进行了广泛的性能评估。此外,我们在使用PMem作为缓冲池或持久存储时,使用Microsoft SQL Server 2019评估OLTP和OLAP数据库工作负载。从吸取的经验教训中,我们分享了一些关于未来使用PMem设计DBMS的建议,例如,简单的硬件或软件更改不足以在DBMS中最好地使用PMem。
{"title":"Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS","authors":"Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do","doi":"10.1145/3399666.3399898","DOIUrl":"https://doi.org/10.1145/3399666.3399898","url":null,"abstract":"Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing the low access latency and byte-addressablity of traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads with Microsoft SQL Server 2019 when using PMem as buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g. simple hardware or software changes are not enough for the best use of PMem in DBMSs.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126017480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
The ReProVide query-sequence optimization in a hardware-accelerated DBMS 硬件加速DBMS中的re提供查询序列优化
G. LekshmiB., Andreas Becher, K. Meyer-Wegener
Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable at runtime, which allows for the adaptation to a variety of queries. Reconfiguration itself, however, takes some time. This paper presents optimizations based on query sequences, which reduce the impact of the reconfigurations. Knowledge of upcoming queries is used to avoid reconfiguration overhead. We evaluate our optimizations with a calibrated model. Improvements in execution time of up to 28% can be obtained even with sequences of only two queries.
借助fpga可以实现数据库查询处理的硬件加速。特别是,它们在运行时可以部分地重新配置,从而允许适应各种查询。然而,重新配置本身需要一些时间。本文提出了基于查询序列的优化方法,减少了重新配置的影响。使用即将到来的查询的知识来避免重新配置开销。我们用一个校准过的模型来评估我们的优化。即使只有两个查询序列,执行时间也可以提高28%。
{"title":"The ReProVide query-sequence optimization in a hardware-accelerated DBMS","authors":"G. LekshmiB., Andreas Becher, K. Meyer-Wegener","doi":"10.1145/3399666.3399926","DOIUrl":"https://doi.org/10.1145/3399666.3399926","url":null,"abstract":"Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable at runtime, which allows for the adaptation to a variety of queries. Reconfiguration itself, however, takes some time. This paper presents optimizations based on query sequences, which reduce the impact of the reconfigurations. Knowledge of upcoming queries is used to avoid reconfiguration overhead. We evaluate our optimizations with a calibrated model. Improvements in execution time of up to 28% can be obtained even with sequences of only two queries.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114585252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings of the 16th International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1