Pub Date : 2020-06-14DOI: 10.1007/s00778-022-00744-2
Johannes Pietrzyk, Dirk Habich, Wolfgang Lehner
{"title":"To share or not to share vector registers?","authors":"Johannes Pietrzyk, Dirk Habich, Wolfgang Lehner","doi":"10.1007/s00778-022-00744-2","DOIUrl":"https://doi.org/10.1007/s00778-022-00744-2","url":null,"abstract":"","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.
{"title":"Concurrent online sampling for all, for free","authors":"Altan Birler, Bernhard Radke, Thomas Neumann","doi":"10.1145/3399666.3399924","DOIUrl":"https://doi.org/10.1145/3399666.3399924","url":null,"abstract":"Database systems rely upon statistical synopses for cardinality estimation. A very versatile and powerful method for estimation purposes is to maintain a random sample of the data. However, drawing a random sample of an existing data set is quite expensive due to the resulting random access pattern, and the sample will get stale over time. It is much more attractive to use online sampling, such that a fresh sample is available at all times, without additional data accesses. While clearly superior from a theoretical perspective, it was not clear how to efficiently integrate online sampling into a database system with high concurrent update and query load. We introduce a novel highly scalable online sampling strategy that allows for sample maintenance with minimal overhead. We can trade off strict freshness guarantees for a significant boost in performance in many-core shared memory scenarios, which is ideal for estimation purposes. We show that by replacing the traditional periodical sample reconstruction in a database system with our online sampling strategy, we get virtually zero overhead in insert performance and completely eliminate the slow random I/O needed for sample construction.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131167529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden
New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.
{"title":"Large-scale in-memory analytics on Intel® Optane™ DC persistent memory","authors":"Anil Shanbhag, Nesime Tatbul, David Cohen, S. Madden","doi":"10.1145/3399666.3399933","DOIUrl":"https://doi.org/10.1145/3399666.3399933","url":null,"abstract":"New data storage technologies such as the recently introduced Intel® Optane™ DC Persistent Memory Module (PMM) offer exciting opportunities for optimizing the query processing performance of database workloads. In particular, the unique combination of low latency, byte-addressability, persistence, and large capacity make persistent memory (PMem) an attractive alternative along with DRAM and SSDs. Exploring the performance characteristics of this new medium is the first critical step in understanding how it will impact the design and performance of database systems. In this paper, we present one of the first experimental studies on characterizing Intel® Optane™ DC PMM's performance behavior in the context of analytical database workloads. First, we analyze basic access patterns common in such workloads, such as sequential, selective, and random reads as well as the complete Star Schema Benchmark, comparing standalone DRAM- and PMem-based implementations. Then we extend our analysis to join algorithms over larger datasets, which require using DRAM and PMem in a hybrid fashion while paying special attention to the read-write asymmetry of PMem. Our study reveals interesting performance tradeoffs that can help guide the design of next-generation OLAP systems in presence of persistent memory in the storage hierarchy.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123027125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig
In this paper, we set out the goal to revisit the results of "Starring into the Abyss [...] of Concurrency Control with [1000] Cores" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.
{"title":"The tale of 1000 Cores: an evaluation of concurrency control on real(ly) large multi-socket hardware","authors":"Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig","doi":"10.1145/3399666.3399910","DOIUrl":"https://doi.org/10.1145/3399666.3399910","url":null,"abstract":"In this paper, we set out the goal to revisit the results of \"Starring into the Abyss [...] of Concurrency Control with [1000] Cores\" [27] and analyse in-memory DBMSs on today's large hardware. Despite the original assumption of the authors, today we do not see single-socket CPUs with 1000 cores. Instead multi-socket hardware made its way into production data centres. Hence, we follow up on this prior work with an evaluation of the characteristics of concurrency control schemes on real production multi-socket hardware with 1568 cores. To our surprise, we made several interesting findings which we report on in this paper.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"21 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134173796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do
Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing the low access latency and byte-addressablity of traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads with Microsoft SQL Server 2019 when using PMem as buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g. simple hardware or software changes are not enough for the best use of PMem in DBMSs.
非易失性存储器(NVM)是一种新兴的存储技术,它既具有大容量存储设备的持久性,又具有传统DRAM存储器的低访问延迟和字节寻址能力。在本文中,我们对最近发布的NVM设备Intel Optane DC Persistent Memory (PMem)在不同配置下使用几个微基准测试工具进行了广泛的性能评估。此外,我们在使用PMem作为缓冲池或持久存储时,使用Microsoft SQL Server 2019评估OLTP和OLAP数据库工作负载。从吸取的经验教训中,我们分享了一些关于未来使用PMem设计DBMS的建议,例如,简单的硬件或软件更改不足以在DBMS中最好地使用PMem。
{"title":"Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS","authors":"Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do","doi":"10.1145/3399666.3399898","DOIUrl":"https://doi.org/10.1145/3399666.3399898","url":null,"abstract":"Non-volatile memory (NVM) is an emerging technology, which has the persistence characteristics of large capacity storage devices, while providing the low access latency and byte-addressablity of traditional DRAM memory. In this paper, we provide extensive performance evaluations on a recently released NVM device, Intel Optane DC Persistent Memory (PMem), under different configurations with several micro-benchmark tools. Further, we evaluate OLTP and OLAP database workloads with Microsoft SQL Server 2019 when using PMem as buffer pool or persistent storage. From the lessons learned we share some recommendations for future DBMS design with PMem, e.g. simple hardware or software changes are not enough for the best use of PMem in DBMSs.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126017480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable at runtime, which allows for the adaptation to a variety of queries. Reconfiguration itself, however, takes some time. This paper presents optimizations based on query sequences, which reduce the impact of the reconfigurations. Knowledge of upcoming queries is used to avoid reconfiguration overhead. We evaluate our optimizations with a calibrated model. Improvements in execution time of up to 28% can be obtained even with sequences of only two queries.
{"title":"The ReProVide query-sequence optimization in a hardware-accelerated DBMS","authors":"G. LekshmiB., Andreas Becher, K. Meyer-Wegener","doi":"10.1145/3399666.3399926","DOIUrl":"https://doi.org/10.1145/3399666.3399926","url":null,"abstract":"Hardware acceleration of database query processing can be done with the help of FPGAs. In particular, they are partially reconfigurable at runtime, which allows for the adaptation to a variety of queries. Reconfiguration itself, however, takes some time. This paper presents optimizations based on query sequences, which reduce the impact of the reconfigurations. Knowledge of upcoming queries is used to avoid reconfiguration overhead. We evaluate our optimizations with a calibrated model. Improvements in execution time of up to 28% can be obtained even with sequences of only two queries.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114585252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov
Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.
{"title":"The collection Virtual Machine: an abstraction for multi-frontend multi-backend data analysis","authors":"Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov","doi":"10.1145/3399666.3399911","DOIUrl":"https://doi.org/10.1145/3399666.3399911","url":null,"abstract":"Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the \"Collection Virtual Machine\" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115271217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Persistent Memory (PM) represents a very promising, next-generation memory solution with a significant impact on database architectures. Several data structures for this new technology have already been proposed. However, primarily only complete structures are presented and evaluated. Thus, the implications of the individual ideas and PM features are concealed. Therefore, in this paper, we disassemble the structures presented so far, identify their underlying design primitives, and assign them to appropriate design goals. As a result of our comprehensive experiments on real PM hardware, we can reveal the trade-offs of the primitives for various access patterns and pinpoint their best use cases.
{"title":"Data structure primitives on persistent memory: an evaluation","authors":"Philipp Götze, Arun Kumar Tharanatha, K. Sattler","doi":"10.1145/3399666.3399900","DOIUrl":"https://doi.org/10.1145/3399666.3399900","url":null,"abstract":"Persistent Memory (PM) represents a very promising, next-generation memory solution with a significant impact on database architectures. Several data structures for this new technology have already been proposed. However, primarily only complete structures are presented and evaluated. Thus, the implications of the individual ideas and PM features are concealed. Therefore, in this paper, we disassemble the structures presented so far, identify their underlying design primitives, and assign them to appropriate design goals. As a result of our comprehensive experiments on real PM hardware, we can reveal the trade-offs of the primitives for various access patterns and pinpoint their best use cases.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126839041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}