Columnar database systems, designed for an optimal OLAP workload performance, strive for maximum multi-core utilization under concurrent query executions. However, multi-core parallel plan generated for isolated execution leads to suboptimal performance during concurrent query execution. In this paper, we analyze the concurrent workload resource contention effects on multi-core plans using three intra-query parallelization techniques, static, adaptive, and cost model parallelization. We focus on a plan level comparison of selected TPC-H queries, using in-memory multi-core columnar systems. Excessive partitions in statically parallelized plans result into heavy L3 cache misses leading to memory contention, degrading query performance severely. Overall, adaptive plans show more robustness, less scheduling overheads, and an average 50% execution time improvement compared to statically parallelized plans, and cost model based plans.
{"title":"Multi-core column-store parallelization under concurrent workload","authors":"M. Gawade, M. Kersten, A. Simitsis","doi":"10.1145/2933349.2933350","DOIUrl":"https://doi.org/10.1145/2933349.2933350","url":null,"abstract":"Columnar database systems, designed for an optimal OLAP workload performance, strive for maximum multi-core utilization under concurrent query executions. However, multi-core parallel plan generated for isolated execution leads to suboptimal performance during concurrent query execution.\u0000 In this paper, we analyze the concurrent workload resource contention effects on multi-core plans using three intra-query parallelization techniques, static, adaptive, and cost model parallelization. We focus on a plan level comparison of selected TPC-H queries, using in-memory multi-core columnar systems. Excessive partitions in statically parallelized plans result into heavy L3 cache misses leading to memory contention, degrading query performance severely. Overall, adaptive plans show more robustness, less scheduling overheads, and an average 50% execution time improvement compared to statically parallelized plans, and cost model based plans.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125629268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iraklis Psaroudakis, T. Kissinger, Danica Porobic, T. Ilsche, Erietta Liarou, Pınar Tözün, A. Ailamaki, Wolfgang Lehner
Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes. A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans.
{"title":"Dynamic fine-grained scheduling for energy-efficient main-memory queries","authors":"Iraklis Psaroudakis, T. Kissinger, Danica Porobic, T. Ilsche, Erietta Liarou, Pınar Tözün, A. Ailamaki, Wolfgang Lehner","doi":"10.1145/2619228.2619229","DOIUrl":"https://doi.org/10.1145/2619228.2619229","url":null,"abstract":"Power and cooling costs are some of the highest costs in data centers today, which make improvement in energy efficiency crucial. Energy efficiency is also a major design point for chips that power whole ranges of computing devices. One important goal in this area is energy proportionality, arguing that the system's power consumption should be proportional to its performance. Currently, a major trend among server processors, which stems from the design of chips for mobile devices, is the inclusion of advanced power management techniques, such as dynamic voltage-frequency scaling, clock gating, and turbo modes.\u0000 A lot of recent work on energy efficiency of database management systems is focused on coarse-grained power management at the granularity of multiple machines and whole queries. These techniques, however, cannot efficiently adapt to the frequently fluctuating behavior of contemporary workloads. In this paper, we argue that databases should employ a fine-grained approach by dynamically scheduling tasks using precise hardware models. These models can be produced by calibrating operators under different combinations of scheduling policies, parallelism, and memory access strategies. The models can be employed at run-time for dynamic scheduling and power management in order to improve the overall energy efficiency. We experimentally show that energy efficiency can be improved by up to 4x for fundamental memory-intensive database operations, such as scans.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125217795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ismail Oukid, Daniel Booss, Wolfgang Lehner, P. Bumbulis, Thomas Willhalm
Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.
{"title":"SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery","authors":"Ismail Oukid, Daniel Booss, Wolfgang Lehner, P. Bumbulis, Thomas Willhalm","doi":"10.1145/2619228.2619236","DOIUrl":"https://doi.org/10.1145/2619228.2619236","url":null,"abstract":"Storage Class Memory (SCM) has the potential to significantly improve database performance. This potential has been well documented for throughput [4] and response time [25, 22]. In this paper we show that SCM has also the potential to significantly improve restart performance, a shortcoming of traditional main memory database systems. We present SOFORT, a hybrid SCM-DRAM storage engine that leverages full capabilities of SCM by doing away with a traditional log and updating the persisted data in place in small increments. We show that we can achieve restart times of a few seconds independent of instance size and transaction volume without significantly impacting transaction throughput.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner
Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.
{"title":"Online bit flip detection for in-memory B-trees on unreliable hardware","authors":"Till Kolditz, T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2619228.2619233","DOIUrl":"https://doi.org/10.1145/2619228.2619233","url":null,"abstract":"Hardware vendors constantly decrease the feature sizes of integrated circuits to obtain better performance and energy efficiency. Due to cosmic rays, low voltage or heat dissipation, hardware -- both processors and memory -- becomes more and more unreliable as the error rate increases. From a database perspective bit flip errors in main memory will become a major challenge for modern in-memory database systems, which keep all their enterprise data in volatile, unreliable main memory. Although existing hardware error control techniques like ECC-DRAM are able to detect and correct memory errors, their detection and correction capabilities are limited. Moreover, hardware error correction faces major drawbacks in terms of acquisition costs, additional memory utilization, and latency. In this paper, we argue that slightly increasing data redundancy at the right places by incorporating context knowledge already increases error detection significantly. We use the B-Tree -- as a widespread index structure -- as an example and propose various techniques for online error detection and thus increase its overall reliability. In our experiments, we found that our techniques can detect more errors in less time on commodity hardware compared to non-resilient B-Trees running in an ECC-DRAM environment. Our techniques can further be easily adapted for other data structures and are a first step in the direction of resilient database systems which can cope with unreliable hardware.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"126 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123305429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bitmap represents an efficient indexing structure for querying large amounts of data and is widely deployed in data-warehouse applications. While the size of a bitmap scales linearly with the number of rows in a table, due to its sparseness, it can be greatly reduced via compression based on run-length encoding. However, updating a compressed bitmap is expensive due to the encoding and decoding overheads, in particular, as re-compression can change the compressed sequence length and data layout. Due to this problem, bitmap indices only perform well for read-only workloads. In this paper, we propose a bitmap index structure which is both space-efficient and allows fast updates, by building on top of a smart memory model called HICAMP. As a consequence, our approach enables bitmap indices for workloads that exhibit high update ratios as in OLTP workloads. We also present a new multi-bit bitmap design which addresses the candidate checking problem. In our experiments, the HICAMP bitmap index demonstrates 3~12x reduction in size over B-tree and 8~30x over other commonly used indexing structures such as Red-Black tree, while supporting efficient updates simultaneously.
{"title":"HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases","authors":"Bo Wang, Heiner Litz, D. Cheriton","doi":"10.1145/2619228.2619235","DOIUrl":"https://doi.org/10.1145/2619228.2619235","url":null,"abstract":"Bitmap represents an efficient indexing structure for querying large amounts of data and is widely deployed in data-warehouse applications. While the size of a bitmap scales linearly with the number of rows in a table, due to its sparseness, it can be greatly reduced via compression based on run-length encoding. However, updating a compressed bitmap is expensive due to the encoding and decoding overheads, in particular, as re-compression can change the compressed sequence length and data layout. Due to this problem, bitmap indices only perform well for read-only workloads.\u0000 In this paper, we propose a bitmap index structure which is both space-efficient and allows fast updates, by building on top of a smart memory model called HICAMP. As a consequence, our approach enables bitmap indices for workloads that exhibit high update ratios as in OLTP workloads. We also present a new multi-bit bitmap design which addresses the candidate checking problem. In our experiments, the HICAMP bitmap index demonstrates 3~12x reduction in size over B-tree and 8~30x over other commonly used indexing structures such as Red-Black tree, while supporting efficient updates simultaneously.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116711944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Pirk, E. Petraki, Stratos Idreos, S. Manegold, M. Kersten
Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.
{"title":"Database cracking: fancy scan, not poor man's sort!","authors":"H. Pirk, E. Petraki, Stratos Idreos, S. Manegold, M. Kersten","doi":"10.1145/2619228.2619232","DOIUrl":"https://doi.org/10.1145/2619228.2619232","url":null,"abstract":"Database Cracking is an appealing approach to adaptive indexing: on every range-selection query, the data is partitioned using the supplied predicates as pivots. The core of database cracking is, thus, pivoted partitioning. While pivoted partitioning, like scanning, requires a single pass through the data it tends to have much higher costs due to lower CPU efficiency. In this paper, we conduct an in-depth study of the reasons for the low CPU efficiency of pivoted partitioning. Based on the findings, we develop an optimized version with significantly higher (single-threaded) CPU efficiency. We also develop a number of multi-threaded implementations that are effectively bound by memory bandwidth. Combining all of these optimizations we achieve an implementation that has costs close to or better than an ordinary scan on a variety of systems ranging from low-end (cheaper than $300) desktop machines to high-end (above $60,000) servers.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116245309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.
{"title":"Vectorized Bloom filters for advanced SIMD processors","authors":"Orestis Polychroniou, K. A. Ross","doi":"10.1145/2619228.2619234","DOIUrl":"https://doi.org/10.1145/2619228.2619234","url":null,"abstract":"Analytics are at the core of many business intelligence tasks. Efficient query execution is facilitated by advanced hardware features, such as multi-core parallelism, shared-nothing low-latency caches, and SIMD vector instructions. Only recently, the SIMD capabilities of mainstream hardware have been augmented with wider vectors and non-contiguous loads termed gathers. While analytical DBMSs minimize the use of indexes in favor of scans based on sequential memory accesses, some data structures remain crucial. The Bloom filter, one such example, is the most efficient structure for filtering tuples based on their existence in a set and its performance is critical when joining tables with vastly different cardinalities. We introduce a vectorized implementation for probing Bloom filters based on gathers that eliminates conditional control flow and is independent of the SIMD length. Our techniques are generic and can be reused for accelerating other database operations. Our evaluation indicates a significant performance improvement over scalar code that can exceed 3X when the Bloom filter is cache-resident.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114324187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, A. Kemper, Thomas Neumann
Physical and thermal restrictions hinder commensurate performance gains from the ever increasing transistor density. While multi-core scaling helped alleviate dimmed or dark silicon for some time, future processors will need to become more heterogeneous. To this end, single instruction set architecture (ISA) heterogeneous processors are a particularly interesting solution that combines multiple cores with the same ISA but asymmetric performance and power characteristics. These processors, however, are no free lunch for database systems. Mapping jobs to the core that fits best is notoriously hard for the operating system or a compiler. To achieve optimal performance and energy efficiency, heterogeneity needs to be exposed to the database system. In this paper, we provide a thorough study of parallelized core database operators and TPC-H query processing on a heterogeneous single-ISA multi-core architecture. Using these insights we design a heterogeneity-conscious job-to-core mapping approach for our high-performance main memory database system HyPer and show that it is indeed possible to get a better mileage while driving faster compared to static and operating-system-controlled mappings. Our approach improves the energy delay product of a TPC-H power run by 31% and up to over 60% for specific TPC-H queries.
{"title":"Heterogeneity-conscious parallel query execution: getting a better mileage while driving faster!","authors":"Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, A. Kemper, Thomas Neumann","doi":"10.1145/2619228.2619230","DOIUrl":"https://doi.org/10.1145/2619228.2619230","url":null,"abstract":"Physical and thermal restrictions hinder commensurate performance gains from the ever increasing transistor density. While multi-core scaling helped alleviate dimmed or dark silicon for some time, future processors will need to become more heterogeneous. To this end, single instruction set architecture (ISA) heterogeneous processors are a particularly interesting solution that combines multiple cores with the same ISA but asymmetric performance and power characteristics. These processors, however, are no free lunch for database systems. Mapping jobs to the core that fits best is notoriously hard for the operating system or a compiler. To achieve optimal performance and energy efficiency, heterogeneity needs to be exposed to the database system.\u0000 In this paper, we provide a thorough study of parallelized core database operators and TPC-H query processing on a heterogeneous single-ISA multi-core architecture. Using these insights we design a heterogeneity-conscious job-to-core mapping approach for our high-performance main memory database system HyPer and show that it is indeed possible to get a better mileage while driving faster compared to static and operating-system-controlled mappings. Our approach improves the energy delay product of a TPC-H power run by 31% and up to over 60% for specific TPC-H queries.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124282998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The skyline operator for multi-criteria search returns the most interesting points of a data set with respect to any monotone preference function. Existing work has almost exclusively focused on efficiently computing skylines on one or more CPUs, ignoring the high parallelism possible in GPUs. In this paper we investigate the challenges for efficient skyline algorithms that exploit the computational power of the GPU. We present a novel strategy for managing data transfer and memory for skylines using CPU and GPU. Our new sorting based data-parallel skyline algorithm is introduced and its properties are discussed. We demonstrate in a thorough experimental evaluation that this algorithm is faster than state-of-the-art sequential sorting based skyline algorithms and that it shows superior scalability.
{"title":"Efficient GPU-based skyline computation","authors":"Kenneth S. Bøgh, I. Assent, Matteo Magnani","doi":"10.1145/2485278.2485283","DOIUrl":"https://doi.org/10.1145/2485278.2485283","url":null,"abstract":"The skyline operator for multi-criteria search returns the most interesting points of a data set with respect to any monotone preference function. Existing work has almost exclusively focused on efficiently computing skylines on one or more CPUs, ignoring the high parallelism possible in GPUs. In this paper we investigate the challenges for efficient skyline algorithms that exploit the computational power of the GPU. We present a novel strategy for managing data transfer and memory for skylines using CPU and GPU. Our new sorting based data-parallel skyline algorithm is introduced and its properties are discussed. We demonstrate in a thorough experimental evaluation that this algorithm is faster than state-of-the-art sequential sorting based skyline algorithms and that it shows superior scalability.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116825923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance.
{"title":"Peak performance: remote memory revisited","authors":"H. Mühleisen, R. Goncalves, M. Kersten","doi":"10.1145/2485278.2485287","DOIUrl":"https://doi.org/10.1145/2485278.2485287","url":null,"abstract":"Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124999856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}