首页 > 最新文献

Proceedings of the 16th International Workshop on Data Management on New Hardware最新文献

英文 中文
Scalable and robust latches for database systems 用于数据库系统的可伸缩和健壮的锁存器
Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, A. Kemper
Multi-core scalability is one of the most important features for database systems running on today's hardware. Not surprisingly, the implementation of locks is paramount to achieving efficient and scalable synchronization. In this work, we identify the key database-specific requirements for lock implementations and evaluate them using both micro-benchmarks and full-fledged database workloads. The results indicate that optimistic locking has superior performance in most workloads due to its minimal overhead and latency. By complementing optimistic locking with a pessimistic shared mode lock we demonstrate that we can also process HTAP workloads efficiently. Finally, we show how lock contention can be handled gracefully without slowing down the uncontented fast path or increasing space requirements by using a lightweight parking lot infrastructure.
多核可伸缩性是在当今硬件上运行的数据库系统最重要的特性之一。毫不奇怪,锁的实现对于实现高效和可扩展的同步至关重要。在这项工作中,我们确定锁定实现的关键数据库特定需求,并使用微基准测试和成熟的数据库工作负载对其进行评估。结果表明,乐观锁定由于其最小的开销和延迟,在大多数工作负载中具有优越的性能。通过用悲观共享模式锁补充乐观锁,我们证明了我们也可以高效地处理HTAP工作负载。最后,我们将展示如何优雅地处理锁争用,而不会减慢不满意的快速路径或通过使用轻量级停车场基础设施增加空间需求。
{"title":"Scalable and robust latches for database systems","authors":"Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, A. Kemper","doi":"10.1145/3399666.3399908","DOIUrl":"https://doi.org/10.1145/3399666.3399908","url":null,"abstract":"Multi-core scalability is one of the most important features for database systems running on today's hardware. Not surprisingly, the implementation of locks is paramount to achieving efficient and scalable synchronization. In this work, we identify the key database-specific requirements for lock implementations and evaluate them using both micro-benchmarks and full-fledged database workloads. The results indicate that optimistic locking has superior performance in most workloads due to its minimal overhead and latency. By complementing optimistic locking with a pessimistic shared mode lock we demonstrate that we can also process HTAP workloads efficiently. Finally, we show how lock contention can be handled gracefully without slowing down the uncontented fast path or increasing space requirements by using a lightweight parking lot infrastructure.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"91 36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128814120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Accelerating re-pair compression using FPGAs 利用fpga加速修复压缩
Robert Lasch, S. Demirsoy, Norman May, V. Ramamurthy, Christian Färber, K. Sattler
Re-Pair is a compression algorithm well-suited for applications that require random accesses to compressed data, but has not found widespread use in the data management community due to its prohibitively high compression times. As Re-Pair is a computationally expensive algorithm and FPGAs are becoming more and more common to accelerate such problems in data centers, we devise an FPGA system that performs Re-Pair compression. The system is implemented in OpenCL, aside from a hash table and sorting component realized in RTL for more control over the synthesized hardware. Our experiments demonstrate that an Intel Arria® 10 GX FPGA with our system compresses an order of magnitude faster than a highly-optimized CPU version of Re-Pair. We discuss further optimization opportunities and argue that our system can scale to being deployed on a more resourceful FPGA for even better performance.
Re-Pair是一种非常适合需要随机访问压缩数据的应用程序的压缩算法,但由于其过高的压缩时间,在数据管理社区中尚未得到广泛使用。由于Re-Pair是一种计算成本很高的算法,而FPGA在数据中心加速这类问题的应用越来越普遍,我们设计了一种执行Re-Pair压缩的FPGA系统。该系统采用OpenCL语言实现,除了在RTL中实现哈希表和排序组件之外,还可以对合成硬件进行更多的控制。我们的实验表明,使用我们的系统的英特尔Arria®10 GX FPGA比高度优化的CPU版本的Re-Pair压缩速度快一个数量级。我们讨论了进一步的优化机会,并认为我们的系统可以扩展到部署在资源更丰富的FPGA上,以获得更好的性能。
{"title":"Accelerating re-pair compression using FPGAs","authors":"Robert Lasch, S. Demirsoy, Norman May, V. Ramamurthy, Christian Färber, K. Sattler","doi":"10.1145/3399666.3399931","DOIUrl":"https://doi.org/10.1145/3399666.3399931","url":null,"abstract":"Re-Pair is a compression algorithm well-suited for applications that require random accesses to compressed data, but has not found widespread use in the data management community due to its prohibitively high compression times. As Re-Pair is a computationally expensive algorithm and FPGAs are becoming more and more common to accelerate such problems in data centers, we devise an FPGA system that performs Re-Pair compression. The system is implemented in OpenCL, aside from a hash table and sorting component realized in RTL for more control over the synthesized hardware. Our experiments demonstrate that an Intel Arria® 10 GX FPGA with our system compresses an order of magnitude faster than a highly-optimized CPU version of Re-Pair. We discuss further optimization opportunities and argue that our system can scale to being deployed on a more resourceful FPGA for even better performance.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124868699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Variable word length word-aligned hybrid compression 可变字长字对齐混合压缩
Florian Grieskamp, Roland Kühn, J. Teubner
The Word-Aligned Hybrid (WAH) compression is a prominent example of a lightweight compression scheme for bitmap indices that considers the word size of the underlying architecture. This is a compromise toward commodity CPUs, where operations below the word granularity perform poorly. With the emergence of novel hardware classes, such compromises may no longer be appropriate. Field-programmable gate arrays (FPGAs) do not even have any meaningful "word size". In this work, we reconsider strategies for bitmap compression in the light of modern hardware architectures. Rather than tuning compression toward a fixed word size, we propose to tune the word size toward optimal compression. The resulting compression scheme, Variable Word Length Word-Aligned Hybrid (VWLWAH), improves compression rates by almost 75% while maintaining line rate performance on FPGAs.
字对齐混合(WAH)压缩是位图索引轻量级压缩方案的一个突出例子,它考虑了底层架构的字长。这是对商品cpu的妥协,在商品cpu中,低于单词粒度的操作性能很差。随着新型硬件类的出现,这样的妥协可能不再合适。现场可编程门阵列(fpga)甚至没有任何有意义的“字长”。在这项工作中,我们根据现代硬件架构重新考虑位图压缩策略。与其将压缩调优到固定的字长,我们建议将字长调优到最优压缩。由此产生的压缩方案可变字长字对齐混合(VWLWAH)将压缩率提高了近75%,同时在fpga上保持线速率性能。
{"title":"Variable word length word-aligned hybrid compression","authors":"Florian Grieskamp, Roland Kühn, J. Teubner","doi":"10.1145/3399666.3399935","DOIUrl":"https://doi.org/10.1145/3399666.3399935","url":null,"abstract":"The Word-Aligned Hybrid (WAH) compression is a prominent example of a lightweight compression scheme for bitmap indices that considers the word size of the underlying architecture. This is a compromise toward commodity CPUs, where operations below the word granularity perform poorly. With the emergence of novel hardware classes, such compromises may no longer be appropriate. Field-programmable gate arrays (FPGAs) do not even have any meaningful \"word size\". In this work, we reconsider strategies for bitmap compression in the light of modern hardware architectures. Rather than tuning compression toward a fixed word size, we propose to tune the word size toward optimal compression. The resulting compression scheme, Variable Word Length Word-Aligned Hybrid (VWLWAH), improves compression rates by almost 75% while maintaining line rate performance on FPGAs.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129427329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical evaluation across multiple GPU-accelerated DBMSes 跨多个gpu加速dbms的经验评估
Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh
In this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases nor have full capability to process some complex analytical queries. Thus, we claim that the GPU DBMSes still need to be further engineered to achieve a better analytical performance.
在本文中,我们对具有TPC-H工作负载的现代gpu加速dbms进行了实证研究。我们严格的实验表明,所研究的dbms似乎可以有效地利用GPU资源,但不能很好地扩展数据库,也不能完全处理一些复杂的分析查询。因此,我们声称GPU dbms仍然需要进一步设计以实现更好的分析性能。
{"title":"Empirical evaluation across multiple GPU-accelerated DBMSes","authors":"Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh","doi":"10.1145/3399666.3399907","DOIUrl":"https://doi.org/10.1145/3399666.3399907","url":null,"abstract":"In this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases nor have full capability to process some complex analytical queries. Thus, we claim that the GPU DBMSes still need to be further engineered to achieve a better analytical performance.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FPGA-Accelerated compression of integer vectors fpga加速整型向量的压缩
Mahmoud Mohsen, Norman May, Christian Färber, David Broneske
An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.
在字典编码的列存储(如SAP HANA)中,整数向量的有效压缩对于在有限且宝贵的主内存中保存更多数据至关重要。过去的研究集中在轻量级压缩技术上,这些技术以低延迟的数据访问换取较低的压缩比。因此,宽表中只有少数列受益于轻量级和有效的压缩方案,如游程编码、前缀压缩或稀疏编码。除了位填充之外,其他列仍然未压缩,这显然错过了许多列获得更好压缩比的机会。此外,压缩的主要执行器是CPU,因为压缩涉及大量数据传输。特别是在与协处理器一起使用时,数据传输开销会抵消使用协处理器带来的性能收益。在本文中,我们研究了是否可以通过使用二进制封装和前缀抑制卸载到FPGA来获得良好的压缩比,甚至对于先前未压缩的列。作为流处理器,FPGA是外包压缩任务的最佳选择。由于我们基于opencl的实现,我们通过使用不到三分之一的FPGA资源,在FPGA压缩期间实现了可用PCIe总线的饱和。此外,我们针对基于cpu的SAP HANA的实际实验表明,在压缩吞吐量方面的性能提高了大约2倍,同时将数据压缩到最佳SAP HANA压缩技术的60%。
{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":"https://doi.org/10.1145/3399666.3399932","url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient generation of machine code for query compilers 有效的生成机器代码的查询编译器
Henning Funke, J. Mühlig, J. Teubner
Query compilation can make query execution extremely efficient, but it introduces additional compilation time. The compilation time causes a relatively high overhead especially for short-running and high-complexity queries. We propose Flounder IR as a lightweight intermediate representation for query compilation to reduce compilation times. Flounder IR is close to machine assembly and adds just that set of features that is necessary for efficient query compilation: virtual registers and function calls ease the construction of the compiler front-end; database-specific extensions enable efficient pipelining in query plans; more elaborate IR features are intentionally left out to maximize compilation speed. In this paper, we present the Flounder IR language and motivate its design; we show how the language makes query compilation intuitive and efficient; and we demonstrate with benchmarks how our Flounder library can significantly reduce query compilation times.
查询编译可以使查询执行非常高效,但它引入了额外的编译时间。编译时间导致相对较高的开销,特别是对于短时间运行和高复杂性的查询。我们建议使用鲽鱼IR作为查询编译的轻量级中间表示,以减少编译时间。bilder IR接近机器汇编,并添加了高效查询编译所必需的一组特性:虚拟寄存器和函数调用简化了编译器前端的构造;特定于数据库的扩展支持查询计划中的高效流水线;为了最大限度地提高编译速度,有意忽略了更复杂的IR特性。在本文中,我们提出了比目鱼红外语言,并激发了它的设计;我们展示了该语言如何使查询编译直观和高效;我们用基准测试来演示我们的鲽鱼库如何显著减少查询编译时间。
{"title":"Efficient generation of machine code for query compilers","authors":"Henning Funke, J. Mühlig, J. Teubner","doi":"10.1145/3399666.3399925","DOIUrl":"https://doi.org/10.1145/3399666.3399925","url":null,"abstract":"Query compilation can make query execution extremely efficient, but it introduces additional compilation time. The compilation time causes a relatively high overhead especially for short-running and high-complexity queries. We propose Flounder IR as a lightweight intermediate representation for query compilation to reduce compilation times. Flounder IR is close to machine assembly and adds just that set of features that is necessary for efficient query compilation: virtual registers and function calls ease the construction of the compiler front-end; database-specific extensions enable efficient pipelining in query plans; more elaborate IR features are intentionally left out to maximize compilation speed. In this paper, we present the Flounder IR language and motivate its design; we show how the language makes query compilation intuitive and efficient; and we demonstrate with benchmarks how our Flounder library can significantly reduce query compilation times.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Comparative analysis of OpenCL and RTL for sort-merge primitives on FPGA FPGA 上排序合并基元的 OpenCL 和 RTL 比较分析
Mehdi Moghaddamfar, Christian Färber, Wolfgang Lehner, Norman May
As a result of recent improvements in FPGA technology, their benefits for highly efficient data processing pipelines are becoming more and more apparent. However, traditional RTL methods for programming FPGAs require knowledge of digital design and hardware description languages. OpenCL™ provides software developers with a C-based platform for implementing their applications without deep knowledge of digital design. In this paper, we conduct a comparative analysis of OpenCL and RTL-based implementations of a novel heapsort with merging sorted runs. In particular, we quantitatively compare their performance, FPGA resource utilization, and development effort. Our results show that while requiring comparable development effort, RTL implementations of critical primitives used in the algorithm achieve 4X better performance while using half as much the FPGA resources.
随着 FPGA 技术的不断进步,其在高效数据处理流水线方面的优势正变得越来越明显。然而,对 FPGA 进行编程的传统 RTL 方法需要数字设计和硬件描述语言方面的知识。OpenCL™ 为软件开发人员提供了一个基于 C 语言的平台,无需深厚的数字设计知识即可实现其应用。在本文中,我们比较分析了 OpenCL 和基于 RTL 的新型堆排序实现,以及合并排序运行。特别是,我们定量比较了它们的性能、FPGA 资源利用率和开发工作量。我们的研究结果表明,在开发工作量相当的情况下,算法中使用的关键基元的 RTL 实现性能提高了 4 倍,而使用的 FPGA 资源仅为原来的一半。
{"title":"Comparative analysis of OpenCL and RTL for sort-merge primitives on FPGA","authors":"Mehdi Moghaddamfar, Christian Färber, Wolfgang Lehner, Norman May","doi":"10.1145/3399666.3399897","DOIUrl":"https://doi.org/10.1145/3399666.3399897","url":null,"abstract":"As a result of recent improvements in FPGA technology, their benefits for highly efficient data processing pipelines are becoming more and more apparent. However, traditional RTL methods for programming FPGAs require knowledge of digital design and hardware description languages. OpenCL™ provides software developers with a C-based platform for implementing their applications without deep knowledge of digital design. In this paper, we conduct a comparative analysis of OpenCL and RTL-based implementations of a novel heapsort with merging sorted runs. In particular, we quantitatively compare their performance, FPGA resource utilization, and development effort. Our results show that while requiring comparable development effort, RTL implementations of critical primitives used in the algorithm achieve 4X better performance while using half as much the FPGA resources.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
nKV
Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, Andreas Koch
Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use. In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4X-2.7X better performance on real hardware - the COSMOS+ platform [22].
{"title":"nKV","authors":"Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, Andreas Koch","doi":"10.1145/3399666.3399934","DOIUrl":"https://doi.org/10.1145/3399666.3399934","url":null,"abstract":"Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use. In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4X-2.7X better performance on real hardware - the COSMOS+ platform [22].","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116519472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Let's add transactions to FPGA-based key-value stores! 让我们将事务添加到基于fpga的键值存储中!
Z. István
In recent years we have seen a proliferation of FPGA-based key value stores (KVSs) [1--3, 5--7, 10] driven by the need for more efficient large-scale data management and storage solutions. In this context, FPGAs are useful because they offer network-bound performance even with small key-value pairs and near-data processing in a fraction of the energy budget of regular servers. Even though the first FPGA-based key-value stores started appearing already in 2013 and have evolved significantly in the meantime, almost no attention has been paid to offering transactions. Today, however, that such systems are becoming increasingly practical, we need to ensure consistency guarantees for concurrent clients (transactions). This position paper makes the case that adding transaction support is not particularly expensive, compared to other parts of these systems, and in the future all FPGA-based KVSs should provide some form of transactional guarantees. In the remaining of this paper we present a high level view of the typical pipelined architecture of FPGA-based KVSs that most existing designs follow, and show three different ways of implementing transactions, with increasing sophistication: from operation batching, through two phase locking (2PL), to a simplified snapshot isolation model.
近年来,由于需要更高效的大规模数据管理和存储解决方案,我们看到了基于fpga的键值存储(KVSs)的激增[1- 3,5 - 7,10]。在这种情况下,fpga是有用的,因为它们提供了网络绑定的性能,即使是小的键值对和近数据处理,在常规服务器的能源预算的一小部分。尽管第一个基于fpga的键值存储在2013年就已经出现,并且在此期间有了很大的发展,但几乎没有人关注提供交易。然而,今天,这样的系统变得越来越实用,我们需要确保并发客户机(事务)的一致性保证。本文认为,与这些系统的其他部分相比,添加事务支持并不是特别昂贵,而且在未来,所有基于fpga的kvs都应该提供某种形式的事务保证。在本文的其余部分中,我们展示了大多数现有设计遵循的基于fpga的kvs的典型流水线架构的高级视图,并展示了三种不同的实现事务的方式,其复杂性越来越高:从操作批处理,到两阶段锁定(2PL),再到简化的快照隔离模型。
{"title":"Let's add transactions to FPGA-based key-value stores!","authors":"Z. István","doi":"10.1145/3399666.3399909","DOIUrl":"https://doi.org/10.1145/3399666.3399909","url":null,"abstract":"In recent years we have seen a proliferation of FPGA-based key value stores (KVSs) [1--3, 5--7, 10] driven by the need for more efficient large-scale data management and storage solutions. In this context, FPGAs are useful because they offer network-bound performance even with small key-value pairs and near-data processing in a fraction of the energy budget of regular servers. Even though the first FPGA-based key-value stores started appearing already in 2013 and have evolved significantly in the meantime, almost no attention has been paid to offering transactions. Today, however, that such systems are becoming increasingly practical, we need to ensure consistency guarantees for concurrent clients (transactions). This position paper makes the case that adding transaction support is not particularly expensive, compared to other parts of these systems, and in the future all FPGA-based KVSs should provide some form of transactional guarantees. In the remaining of this paper we present a high level view of the typical pipelined architecture of FPGA-based KVSs that most existing designs follow, and show three different ways of implementing transactions, with increasing sophistication: from operation batching, through two phase locking (2PL), to a simplified snapshot isolation model.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Analyzing memory accesses with modern processors 用现代处理器分析内存访问
Stefan Noll, J. Teubner, Norman May, Alexander Böhm
Debugging and tuning database systems is very challenging. Using common profiling tools is often not sufficient because they identify the machine instruction rather than the instance of a data structure that causes a performance problem. This leaves a problem's root cause such as memory hotspots or poor data layouts hidden. The state-of-the-art solution is to augment classical profiling with a memory trace. However, current approaches for collecting memory traces are not usable in practice due to their large runtime overhead. In this work, we leverage a mechanism available in modern processors to collect memory traces via hardware-based sampling. We evaluate our approach using a commercial and an open-source database system running the JCC-H benchmark. In particular, we demonstrate that our approach is practical due to its low runtime overhead and we illustrate how memory traces uncover new insights into the memory access characteristics of database systems.
调试和调优数据库系统非常具有挑战性。使用通用的分析工具通常是不够的,因为它们识别的是机器指令,而不是导致性能问题的数据结构的实例。这就隐藏了问题的根本原因,比如内存热点或糟糕的数据布局。最先进的解决方案是使用内存跟踪来增强传统的分析。然而,目前收集内存轨迹的方法在实践中是不可用的,因为它们的运行时开销很大。在这项工作中,我们利用现代处理器中可用的机制,通过基于硬件的采样来收集内存轨迹。我们使用运行JCC-H基准的商业和开源数据库系统来评估我们的方法。特别是,我们证明了我们的方法是实用的,因为它的低运行时开销,我们说明了内存跟踪如何揭示数据库系统的内存访问特征的新见解。
{"title":"Analyzing memory accesses with modern processors","authors":"Stefan Noll, J. Teubner, Norman May, Alexander Böhm","doi":"10.1145/3399666.3399896","DOIUrl":"https://doi.org/10.1145/3399666.3399896","url":null,"abstract":"Debugging and tuning database systems is very challenging. Using common profiling tools is often not sufficient because they identify the machine instruction rather than the instance of a data structure that causes a performance problem. This leaves a problem's root cause such as memory hotspots or poor data layouts hidden. The state-of-the-art solution is to augment classical profiling with a memory trace. However, current approaches for collecting memory traces are not usable in practice due to their large runtime overhead. In this work, we leverage a mechanism available in modern processors to collect memory traces via hardware-based sampling. We evaluate our approach using a commercial and an open-source database system running the JCC-H benchmark. In particular, we demonstrate that our approach is practical due to its low runtime overhead and we illustrate how memory traces uncover new insights into the memory access characteristics of database systems.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123325720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 16th International Workshop on Data Management on New Hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1