Jasmina Malicevic, Subramanya R. Dulloor, N. Sundaram, N. Satish, Jeffrey R. Jackson, W. Zwaenepoel
Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, however, is not able to match the increasing demands for capacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth. Using an NVM hardware emulator, we study the suitability of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their performance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating memory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only). While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in performance degradation of up to 4x with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers. Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective manner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2x) by placing only 6.7% to 31.5% of its total memory footprint in DRAM.
{"title":"Exploiting NVM in large-scale graph analytics","authors":"Jasmina Malicevic, Subramanya R. Dulloor, N. Sundaram, N. Satish, Jeffrey R. Jackson, W. Zwaenepoel","doi":"10.1145/2819001.2819005","DOIUrl":"https://doi.org/10.1145/2819001.2819005","url":null,"abstract":"Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, however, is not able to match the increasing demands for capacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth.\u0000 Using an NVM hardware emulator, we study the suitability of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their performance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating memory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only).\u0000 While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in performance degradation of up to 4x with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers.\u0000 Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective manner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2x) by placing only 6.7% to 31.5% of its total memory footprint in DRAM.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114224800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Weiss, S. Subramanian, S. Sundararaman, Vinay Sridhar, Nisha Talagala, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
As flash devices become ubiquitous in data centers and cost per gigabyte drops, flash systems need to provide data services similar to those of traditional storage. We present Mjölnir, a powerful and scalable engine that addresses the core problems that make efficient flash based data services challenging: multi-reference management and garbage collection. Additionally, by providing powerful primitives for address remapping, Mjölnir enables redesign of the I/O stack for greater efficiency and performance with flash. Mjölnir uses techniques from language runtimes for reference management and garbage collection; we show via prototype and experimental evaluation that this design can deliver predictable performance even with varied user workloads across a range of capacity and reference-count scales.
{"title":"Mjölnir: collecting trash in a demanding new world","authors":"Z. Weiss, S. Subramanian, S. Sundararaman, Vinay Sridhar, Nisha Talagala, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/2819001.2819006","DOIUrl":"https://doi.org/10.1145/2819001.2819006","url":null,"abstract":"As flash devices become ubiquitous in data centers and cost per gigabyte drops, flash systems need to provide data services similar to those of traditional storage. We present Mjölnir, a powerful and scalable engine that addresses the core problems that make efficient flash based data services challenging: multi-reference management and garbage collection. Additionally, by providing powerful primitives for address remapping, Mjölnir enables redesign of the I/O stack for greater efficiency and performance with flash. Mjölnir uses techniques from language runtimes for reference management and garbage collection; we show via prototype and experimental evaluation that this design can deliver predictable performance even with varied user workloads across a range of capacity and reference-count scales.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122213104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sundararaman, Nisha Talagala, Dhananjoy Das, Amar Mudrankit, D. Arteaga
The emergence of persistent memories promises a sea-change in application and data center architectures, with efficiencies and performance not possible with today's volatile DRAM and persistent slow storage. We present Software Defined Persistent Memory, an approach that enables applications to use persistent memory in a variety of local and remote configurations. The heterogeneity is managed by a middleware that manages hardware specific needs and optimizations. We present the first ever design and implementation of such an architecture, and illustrate the key abstractions that are needed to hide hardware specific details from applications while exposing necessary characteristics for performance optimization. We evaluate the performance of our implementation on a set of microbenchmarks and database workloads using the MySQL database. Through our evaluation, we show that it is possible to apply Software Defined concepts to persistent memory, to improve performance while retaining functionality and optimizing for different hardware architectures.
{"title":"Towards software defined persistent memory: rethinking software support for heterogenous memory architectures","authors":"S. Sundararaman, Nisha Talagala, Dhananjoy Das, Amar Mudrankit, D. Arteaga","doi":"10.1145/2819001.2819004","DOIUrl":"https://doi.org/10.1145/2819001.2819004","url":null,"abstract":"The emergence of persistent memories promises a sea-change in application and data center architectures, with efficiencies and performance not possible with today's volatile DRAM and persistent slow storage. We present Software Defined Persistent Memory, an approach that enables applications to use persistent memory in a variety of local and remote configurations. The heterogeneity is managed by a middleware that manages hardware specific needs and optimizations. We present the first ever design and implementation of such an architecture, and illustrate the key abstractions that are needed to hide hardware specific details from applications while exposing necessary characteristics for performance optimization. We evaluate the performance of our implementation on a set of microbenchmarks and database workloads using the MySQL database. Through our evaluation, we show that it is possible to apply Software Defined concepts to persistent memory, to improve performance while retaining functionality and optimizing for different hardware architectures.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127971704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biplob K. Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, C. Ungureanu
Phase Change Memory (PCM) is emerging as an attractive alternative to Dynamic Random Access Memory (DRAM) in building data-intensive computing systems. PCM offers read/write performance asymmetry that makes it necessary to revisit the design of in-memory applications. In this paper, we focus on in-memory hash tables, a family of data structures with wide applicability. We evaluate several popular hash-table designs to understand their performance under PCM. We find that for write-heavy workloads the designs that achieve best performance for PCMdiffer from the ones that are best for DRAM, and that designs achieving a high load factor also cause a high number of memory writes. Finally, we propose PFHT, a PCM-Friendly Hash Table which presents a cuckoo hashing variant that is tailored to PCM characteristics, and offers a better trade-off between performance, the amount of writes generated, and the expected load factor than any of the existing DRAM-based implementations.
{"title":"Revisiting hash table design for phase change memory","authors":"Biplob K. Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, C. Ungureanu","doi":"10.1145/2819001.2819002","DOIUrl":"https://doi.org/10.1145/2819001.2819002","url":null,"abstract":"Phase Change Memory (PCM) is emerging as an attractive alternative to Dynamic Random Access Memory (DRAM) in building data-intensive computing systems. PCM offers read/write performance asymmetry that makes it necessary to revisit the design of in-memory applications.\u0000 In this paper, we focus on in-memory hash tables, a family of data structures with wide applicability. We evaluate several popular hash-table designs to understand their performance under PCM. We find that for write-heavy workloads the designs that achieve best performance for PCMdiffer from the ones that are best for DRAM, and that designs achieving a high load factor also cause a high number of memory writes. Finally, we propose PFHT, a PCM-Friendly Hash Table which presents a cuckoo hashing variant that is tailored to PCM characteristics, and offers a better trade-off between performance, the amount of writes generated, and the expected load factor than any of the existing DRAM-based implementations.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123461518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we develop IO trace and analysis framework, Androtrace, which is specifically tailored for Android platform. Unlike earlier works that required prolonged post processing procedures, Androtrace not only traces with low overhead, but also provides efficient solution for storage with--in mobile devices. Captured IO trace is temporarily stored in main memory and storage device, and they are transferred to Androtrace server when the device is connected to WiFi. We use server and client model to support and analyze multiple Android users. Using the framework, we find that write IOs are dominant in mobile workload.
{"title":"Androtrace: framework for tracing and analyzing IOs on Android","authors":"Eunryoung Lim, Seongjin Lee, Y. Won","doi":"10.1145/2819001.2819007","DOIUrl":"https://doi.org/10.1145/2819001.2819007","url":null,"abstract":"In this work, we develop IO trace and analysis framework, Androtrace, which is specifically tailored for Android platform. Unlike earlier works that required prolonged post processing procedures, Androtrace not only traces with low overhead, but also provides efficient solution for storage with--in mobile devices. Captured IO trace is temporarily stored in main memory and storage device, and they are transferred to Androtrace server when the device is connected to WiFi. We use server and client model to support and analyze multiple Android users. Using the framework, we find that write IOs are dominant in mobile workload.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131410675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}