INFLOW '15最新文献

英文中文

Exploiting NVM in large-scale graph analytics 在大规模图分析中利用NVM

INFLOW '15

Pub Date : 2015-10-04 DOI: 10.1145/2819001.2819005

Jasmina Malicevic, Subramanya R. Dulloor, N. Sundaram, N. Satish, Jeffrey R. Jackson, W. Zwaenepoel

Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, however, is not able to match the increasing demands for capacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth. Using an NVM hardware emulator, we study the suitability of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their performance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating memory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only). While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in performance degradation of up to 4x with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers. Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective manner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2x) by placing only 6.7% to 31.5% of its total memory footprint in DRAM.

像图形分析这样的数据中心应用程序需要具有更大内存容量的服务器。然而，DRAM的扩展无法满足日益增长的容量需求。新兴的字节可寻址、非易失性内存技术(NVM)提供了一种更具可扩展性的替代方案，其内存可直接对软件进行寻址，但具有更高的延迟和更低的带宽。使用NVM硬件模拟器，我们研究了NVM在满足四种最先进的图形分析框架(即Graphlab, Galois, X-Stream和Graphmat)的内存需求方面的适用性。我们通过分配DRAM(仅限DRAM)或模拟NVM(仅限NVM)的内存，用流行的算法(Pagerank, BFS，三角形计数和协同过滤)评估它们的性能。虽然所有这些应用程序都对更高的延迟或更低的NVM带宽很敏感，导致仅使用NVM(与仅使用dram相比)的性能下降高达4倍，但我们表明，在利用CPU内存级并行性和硬件预取器的框架中，性能影响有所减轻。此外，我们证明，在具有NVM和DRAM的混合内存系统中，基于应用程序数据的相对重要性的智能放置可能有助于以经济有效的方式抵消仅NVM解决方案的开销(即仅使用少量DRAM)。具体来说，我们表明，根据算法的不同，Graphmat通过仅将其总内存占用的6.7%至31.5%放置在DRAM中，可以实现接近仅使用DRAM的性能(在1.2倍以内)。

{"title":"Exploiting NVM in large-scale graph analytics","authors":"Jasmina Malicevic, Subramanya R. Dulloor, N. Sundaram, N. Satish, Jeffrey R. Jackson, W. Zwaenepoel","doi":"10.1145/2819001.2819005","DOIUrl":"https://doi.org/10.1145/2819001.2819005","url":null,"abstract":"Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, however, is not able to match the increasing demands for capacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth.\u0000 Using an NVM hardware emulator, we study the suitability of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their performance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating memory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only).\u0000 While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in performance degradation of up to 4x with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers.\u0000 Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective manner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2x) by placing only 6.7% to 31.5% of its total memory footprint in DRAM.","PeriodicalId":293142,"journal":{"name":"INFLOW '15","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114224800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Mjölnir: collecting trash in a demanding new world Mjölnir:在一个苛刻的新世界收集垃圾

INFLOW '15

Pub Date : 2015-10-04 DOI: 10.1145/2819001.2819006

Z. Weiss, S. Subramanian, S. Sundararaman, Vinay Sridhar, Nisha Talagala, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

As flash devices become ubiquitous in data centers and cost per gigabyte drops, flash systems need to provide data services similar to those of traditional storage. We present Mjölnir, a powerful and scalable engine that addresses the core problems that make efficient flash based data services challenging: multi-reference management and garbage collection. Additionally, by providing powerful primitives for address remapping, Mjölnir enables redesign of the I/O stack for greater efficiency and performance with flash. Mjölnir uses techniques from language runtimes for reference management and garbage collection; we show via prototype and experimental evaluation that this design can deliver predictable performance even with varied user workloads across a range of capacity and reference-count scales.

随着闪存设备在数据中心的普及和每千兆字节成本的下降，闪存系统需要提供与传统存储类似的数据服务。我们提出Mjölnir，一个强大的和可扩展的引擎，解决了使高效的基于闪存的数据服务具有挑战性的核心问题:多引用管理和垃圾收集。此外，通过为地址重新映射提供强大的原语，Mjölnir可以重新设计I/O堆栈，以提高flash的效率和性能。Mjölnir使用语言运行时的技术进行引用管理和垃圾收集;我们通过原型和实验评估表明，即使在容量和引用计数规模范围内的不同用户工作负载下，这种设计也可以提供可预测的性能。

引用次数: 1

Towards software defined persistent memory: rethinking software support for heterogenous memory architectures 迈向软件定义的持久内存:重新思考软件对异构内存架构的支持

INFLOW '15

Pub Date : 2015-10-04 DOI: 10.1145/2819001.2819004

S. Sundararaman, Nisha Talagala, Dhananjoy Das, Amar Mudrankit, D. Arteaga

The emergence of persistent memories promises a sea-change in application and data center architectures, with efficiencies and performance not possible with today's volatile DRAM and persistent slow storage. We present Software Defined Persistent Memory, an approach that enables applications to use persistent memory in a variety of local and remote configurations. The heterogeneity is managed by a middleware that manages hardware specific needs and optimizations. We present the first ever design and implementation of such an architecture, and illustrate the key abstractions that are needed to hide hardware specific details from applications while exposing necessary characteristics for performance optimization. We evaluate the performance of our implementation on a set of microbenchmarks and database workloads using the MySQL database. Through our evaluation, we show that it is possible to apply Software Defined concepts to persistent memory, to improve performance while retaining functionality and optimizing for different hardware architectures.

持久存储器的出现有望给应用程序和数据中心架构带来翻天覆地的变化，其效率和性能是当今易失性DRAM和持久慢速存储器所无法实现的。我们提出了软件定义的持久内存，这种方法使应用程序能够在各种本地和远程配置中使用持久内存。异构性由中间件管理，中间件管理特定于硬件的需求和优化。我们首次展示了这种体系结构的设计和实现，并说明了向应用程序隐藏硬件特定细节所需的关键抽象，同时暴露了性能优化所需的必要特征。我们使用MySQL数据库在一组微基准测试和数据库工作负载上评估我们的实现的性能。通过我们的评估，我们展示了将软件定义的概念应用于持久内存，在保留功能和优化不同硬件架构的同时提高性能是可能的。

引用次数: 4

Revisiting hash table design for phase change memory 重新审视相变存储器的哈希表设计

INFLOW '15

Pub Date : 2015-10-04 DOI: 10.1145/2819001.2819002

Biplob K. Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, C. Ungureanu

Phase Change Memory (PCM) is emerging as an attractive alternative to Dynamic Random Access Memory (DRAM) in building data-intensive computing systems. PCM offers read/write performance asymmetry that makes it necessary to revisit the design of in-memory applications. In this paper, we focus on in-memory hash tables, a family of data structures with wide applicability. We evaluate several popular hash-table designs to understand their performance under PCM. We find that for write-heavy workloads the designs that achieve best performance for PCMdiffer from the ones that are best for DRAM, and that designs achieving a high load factor also cause a high number of memory writes. Finally, we propose PFHT, a PCM-Friendly Hash Table which presents a cuckoo hashing variant that is tailored to PCM characteristics, and offers a better trade-off between performance, the amount of writes generated, and the expected load factor than any of the existing DRAM-based implementations.

相变存储器(PCM)作为动态随机存取存储器(DRAM)的一种有吸引力的替代方案，在构建数据密集型计算系统中崭露头角。PCM提供了读/写性能不对称，这使得有必要重新考虑内存应用程序的设计。在本文中，我们关注内存哈希表，这是一种具有广泛适用性的数据结构。我们评估了几种流行的哈希表设计，以了解它们在PCM下的性能。我们发现，对于写量大的工作负载，实现pcm最佳性能的设计不同于实现DRAM最佳性能的设计，并且实现高负载因子的设计也会导致大量内存写入。最后，我们提出PFHT，这是一种PCM友好哈希表，它提供了一种针对PCM特性量身定制的杜鹃哈希变体，与任何现有的基于dram的实现相比，它在性能、生成的写入量和预期负载因子之间提供了更好的权衡。

引用次数: 14

Androtrace: framework for tracing and analyzing IOs on Android Androtrace:用于在Android上跟踪和分析IOs的框架

INFLOW '15

Pub Date : 2015-10-04 DOI: 10.1145/2819001.2819007

Eunryoung Lim, Seongjin Lee, Y. Won

In this work, we develop IO trace and analysis framework, Androtrace, which is specifically tailored for Android platform. Unlike earlier works that required prolonged post processing procedures, Androtrace not only traces with low overhead, but also provides efficient solution for storage with--in mobile devices. Captured IO trace is temporarily stored in main memory and storage device, and they are transferred to Androtrace server when the device is connected to WiFi. We use server and client model to support and analyze multiple Android users. Using the framework, we find that write IOs are dominant in mobile workload.

在这项工作中，我们开发了专门为Android平台量身定制的IO跟踪和分析框架Androtrace。不像早期的作品，需要长时间的后处理程序，Androtrace不仅跟踪低开销，但也提供了有效的解决方案，在移动设备的存储。捕获的IO trace暂时存储在主存和存储设备中，当设备连接WiFi时，它们被传输到Androtrace服务器。我们使用服务器和客户端模型来支持和分析多个Android用户。使用该框架，我们发现编写IOs在移动工作负载中占主导地位。

引用次数: 10

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

INFLOW '15

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀