Proceedings of the 16th International Workshop on Data Management on New Hardware最新文献

英文中文

To share or not to share vector registers? 共享还是不共享矢量寄存器?

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-06-14 DOI: 10.1007/s00778-022-00744-2

Johannes Pietrzyk, Dirk Habich, Wolfgang Lehner

引用次数: 6

Concurrent online sampling for all, for free 同时在线采样所有，免费

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-06-14 DOI: 10.1145/3399666.3399924

Altan Birler, Bernhard Radke, Thomas Neumann

Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.

数据库系统依赖统计概要进行基数估计。用于估计目的的一种非常通用和强大的方法是保持数据的随机样本。然而，由于产生的随机访问模式，绘制现有数据集的随机样本是相当昂贵的，并且样本会随着时间的推移而过时。使用在线抽样更有吸引力，这样在任何时候都可以获得新鲜的样本，而不需要额外的数据访问。虽然从理论的角度来看，在线采样显然是优越的，但如何有效地将在线采样集成到具有高并发更新和查询负载的数据库系统中，目前还不清楚。我们引入了一种新颖的高度可扩展的在线采样策略，该策略允许以最小的开销维护样本。在多核共享内存场景中，我们可以用严格的新鲜度保证来换取性能的显著提升，这对于评估来说是理想的。我们表明，通过用我们的在线采样策略取代数据库系统中传统的周期性样本重建，我们在插入性能上的开销几乎为零，并且完全消除了样本构建所需的缓慢随机I/O。

{"title":"Concurrent online sampling for all, for free","authors":"Altan Birler, Bernhard Radke, Thomas Neumann","doi":"10.1145/3399666.3399924","DOIUrl":"https://doi.org/10.1145/3399666.3399924","url":null,"abstract":"Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131167529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Large-scale in-memory analytics on Intel® Optane™ DC persistent memory 基于Intel®Optane™DC持久内存的大规模内存分析

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-06-14 DOI: 10.1145/3399666.3399933

Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden

New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.

新的数据存储技术，如最近推出的Intel®Optane™DC Persistent Memory Module (PMM)，为优化数据库工作负载的查询处理性能提供了令人兴奋的机会。特别是，低延迟、字节可寻址性、持久性和大容量的独特组合使持久性内存(PMem)与DRAM和ssd一起成为有吸引力的替代方案。探索这种新媒介的性能特征是理解它将如何影响数据库系统的设计和性能的第一个关键步骤。在本文中，我们介绍了在分析数据库工作负载背景下表征英特尔®Optane™DC PMM性能行为的首批实验研究之一。首先，我们分析了这类工作负载中常见的基本访问模式，例如顺序读取、选择性读取和随机读取，以及完整的Star Schema基准测试，比较了独立的基于DRAM和基于pmems的实现。然后，我们将分析扩展到更大数据集上的连接算法，这需要以混合方式使用DRAM和PMem，同时特别注意PMem的读写不对称。我们的研究揭示了有趣的性能权衡，可以帮助指导在存储层次结构中存在持久内存的下一代OLAP系统的设计。

{"title":"Large-scale in-memory analytics on Intel® Optane™ DC persistent memory","authors":"Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden","doi":"10.1145/3399666.3399933","DOIUrl":"https://doi.org/10.1145/3399666.3399933","url":null,"abstract":"New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123027125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

The tale of 1000 Cores: an evaluation of concurrency control on real(ly) large multi-socket hardware 1000核的故事:对真正(真正)大型多套接字硬件上的并发控制的评估

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-06-14 DOI: 10.1145/3399666.3399910

Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig

In this paper, we set out the goal to revisit the results of "Starring into the Abyss [...] of Concurrency Control with [1000] Cores" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.

在本文中，我们的目标是重新审视“主演入深渊”的结果。[1000]个内核并发控制][27]，并分析当今大型硬件上的内存dbms。尽管作者最初的假设是这样的，但今天我们并没有看到1000核的单插槽cpu。相反，多插座硬件进入了生产数据中心。因此，我们在之前的工作基础上，对实际生产的1568核多插槽硬件上的并发控制方案的特性进行了评估。令我们惊讶的是，我们在这篇论文中报告了几个有趣的发现。

引用次数: 13

Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS 从数据库管理系统中Intel optane DC持久内存的早期性能评估中获得的经验教训

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-05-15 DOI: 10.1145/3399666.3399898

Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do

Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing the low access latency and byte-addressablity of traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads with Microsoft SQL Server 2019 when using PMem as buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g. simple hardware or software changes are not enough for the best use of PMem in DBMSs.

非易失性存储器(NVM)是一种新兴的存储技术，它既具有大容量存储设备的持久性，又具有传统DRAM存储器的低访问延迟和字节寻址能力。在本文中，我们对最近发布的NVM设备Intel Optane DC Persistent Memory (PMem)在不同配置下使用几个微基准测试工具进行了广泛的性能评估。此外，我们在使用PMem作为缓冲池或持久存储时，使用Microsoft SQL Server 2019评估OLTP和OLAP数据库工作负载。从吸取的经验教训中，我们分享了一些关于未来使用PMem设计DBMS的建议，例如，简单的硬件或软件更改不足以在DBMS中最好地使用PMem。

引用次数: 28

The ReProVide query-sequence optimization in a hardware-accelerated DBMS 硬件加速DBMS中的re提供查询序列优化

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-05-01 DOI: 10.1145/3399666.3399926

G. LekshmiB., Andreas Becher, K. Meyer-Wegener

Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable at runtime, which allows for the adaptation to a variety of queries. Reconfiguration itself, however, takes some time. This paper presents optimizations based on query sequences, which reduce the impact of the reconfigurations. Knowledge of upcoming queries is used to avoid reconfiguration overhead. We evaluate our optimizations with a calibrated model. Improvements in execution time of up to 28% can be obtained even with sequences of only two queries.

借助fpga可以实现数据库查询处理的硬件加速。特别是，它们在运行时可以部分地重新配置，从而允许适应各种查询。然而，重新配置本身需要一些时间。本文提出了基于查询序列的优化方法，减少了重新配置的影响。使用即将到来的查询的知识来避免重新配置开销。我们用一个校准过的模型来评估我们的优化。即使只有两个查询序列，执行时间也可以提高28%。

引用次数: 3

The collection Virtual Machine: an abstraction for multi-frontend multi-backend data analysis 集合虚拟机:多前端多后端数据分析的抽象

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-04-04 DOI: 10.1145/3399666.3399911

Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov

Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.

从数量不断增加的硬件平台中获得最佳性能一直是数据处理系统面临的一个反复出现的挑战。近年来，数据科学的出现及其越来越多和复杂的分析类型使这一挑战变得更加困难。在实践中，系统设计人员被大量的组合所淹没，并且通常在一个平台上实现单一的分析类型，导致重复的实现工作-以及数据科学家的大量半兼容工具。在本文中，我们提出了“集合虚拟机”(或CVM)——一个可扩展的编译器框架，旨在保持数据分析系统的专业化过程易于处理。它可以同时捕获大范围的低级、特定于硬件的实现技术的本质，以及不同类型分析的高级操作。其核心是用于定义嵌套的、面向集合的中间表示(ir)的语言。前端生成用该语言定义的IR风格的程序，这些程序通过一系列重写(可能多次更改IR风格)得到优化，直到程序最终用特定于平台操作符的IR表示。在减少总体实现工作量的同时，这也提高了分析和硬件平台的互操作性。我们已经成功地使用CVM为各种平台构建专门的后端，如多核cpu、RDMA集群和云中的无服务器计算基础设施，并期望在不久的将来为更多的前端和硬件平台提供类似的结果。

{"title":"The collection Virtual Machine: an abstraction for multi-frontend multi-backend data analysis","authors":"Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov","doi":"10.1145/3399666.3399911","DOIUrl":"https://doi.org/10.1145/3399666.3399911","url":null,"abstract":"Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the \"Collection Virtual Machine\" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115271217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Data structure primitives on persistent memory: an evaluation 持久内存上的数据结构原语:一个评估

Proceedings of the 16th International Workshop on Data Management on New Hardware

Pub Date : 2020-01-07 DOI: 10.1145/3399666.3399900

Philipp Götze, Arun Kumar Tharanatha, K. Sattler

Persistent Memory (PM) represents a very promising, next-generation memory solution with a significant impact on database architectures. Several data structures for this new technology have already been proposed. However, primarily only complete structures are presented and evaluated. Thus, the implications of the individual ideas and PM features are concealed. Therefore, in this paper, we disassemble the structures presented so far, identify their underlying design primitives, and assign them to appropriate design goals. As a result of our comprehensive experiments on real PM hardware, we can reveal the trade-offs of the primitives for various access patterns and pinpoint their best use cases.

持久内存(Persistent Memory, PM)是一种非常有前途的下一代内存解决方案，对数据库体系结构有重大影响。已经为这种新技术提出了几种数据结构。然而，主要只是完整的结构被提出和评估。因此，隐藏了个人思想和项目管理特征的含义。因此，在本文中，我们拆解到目前为止提出的结构，确定它们的底层设计原语，并将它们分配给适当的设计目标。由于我们在实际的PM硬件上进行了全面的实验，我们可以揭示各种访问模式的原语的权衡，并确定它们的最佳用例。

引用次数: 16

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 16th International Workshop on Data Management on New Hardware

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀