Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories

IF 2 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS ACM Transactions on Computer Systems Pub Date : 2022-07-05 DOI:https://dl.acm.org/doi/full/10.1145/3511211

Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui

{"title":"Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories","authors":"Lei Chen, Jiacheng Zhao, Chenxi Wang, Ting Cao, John Zigman, Haris Volos, Onur Mutlu, Fang Lv, Xiaobing Feng, Guoqing Harry Xu, Huimin Cui","doi":"https://dl.acm.org/doi/full/10.1145/3511211","DOIUrl":null,"url":null,"abstract":"To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However, most Big Data applications are written in managed languages and executed on top of a managed runtime that already performs various dimensions of memory management. Supporting hybrid physical memories adds a new dimension, creating unique challenges in data replacement. This article proposes Panthera, a semantics-aware, fully automated memory management technique for Big Data processing over hybrid memories. Panthera analyzes user programs on a Big Data system to infer their coarse-grained access patterns, which are then passed to the Panthera runtime for efficient data placement and migration. For Big Data applications, the coarse-grained data division information is accurate enough to guide the GC for data layout, which hardly incurs overhead in data monitoring and moving. We implemented Panthera in OpenJDK and Apache Spark. Based on Big Data applications’ memory access pattern, we also implemented a new profiling-guided optimization strategy, which is transparent to applications. With this optimization, our extensive evaluation demonstrates that Panthera reduces energy by 32–53% at less than 1% time overhead on average. To show Panthera’s applicability, we extend it to QuickCached, a pure Java implementation of Memcached. Our evaluation results show that Panthera reduces energy by 28.7% at 5.2% time overhead on average.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"6 6","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3511211","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However, most Big Data applications are written in managed languages and executed on top of a managed runtime that already performs various dimensions of memory management. Supporting hybrid physical memories adds a new dimension, creating unique challenges in data replacement. This article proposes Panthera, a semantics-aware, fully automated memory management technique for Big Data processing over hybrid memories. Panthera analyzes user programs on a Big Data system to infer their coarse-grained access patterns, which are then passed to the Panthera runtime for efficient data placement and migration. For Big Data applications, the coarse-grained data division information is accurate enough to guide the GC for data layout, which hardly incurs overhead in data monitoring and moving. We implemented Panthera in OpenJDK and Apache Spark. Based on Big Data applications’ memory access pattern, we also implemented a new profiling-guided optimization strategy, which is transparent to applications. With this optimization, our extensive evaluation demonstrates that Panthera reduces energy by 32–53% at less than 1% time overhead on average. To show Panthera’s applicability, we extend it to QuickCached, a pure Java implementation of Memcached. Our evaluation results show that Panthera reduces energy by 28.7% at 5.2% time overhead on average.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

支持混合内存上多个大数据处理框架的统一整体内存管理

为了处理真实世界的数据集，现代数据并行系统通常需要非常大的内存，这既昂贵又节能。新兴的非易失性存储器(NVM)技术与DRAM相比容量大，与ssd相比能耗低。因此，nvm有可能从根本上改变大数据处理中DRAM和耐用存储之间的二分法。然而，大多数大数据应用程序是用托管语言编写的，并在托管运行时上执行，该运行时已经执行了各种内存管理。支持混合物理存储器增加了一个新的维度，在数据替换方面带来了独特的挑战。本文提出了Panthera，一种语义感知的、完全自动化的内存管理技术，用于混合内存上的大数据处理。Panthera分析大数据系统上的用户程序，以推断其粗粒度访问模式，然后将其传递给Panthera运行时，以实现有效的数据放置和迁移。对于大数据应用，粗粒度的数据划分信息足够精确，可以指导GC进行数据布局，几乎不会产生数据监控和移动的开销。我们在OpenJDK和Apache Spark中实现了Panthera。基于大数据应用的内存访问模式，我们还实现了一种新的基于性能分析的优化策略，该策略对应用透明。通过这种优化，我们的广泛评估表明，Panthera在平均不到1%的时间开销下减少了32-53%的能源。为了展示Panthera的适用性，我们将其扩展到QuickCached，这是Memcached的纯Java实现。我们的评估结果表明，Panthera在平均5.2%的时间开销下减少了28.7%的能源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Computer Systems 工程技术-计算机：理论方法

CiteScore

4.00

自引率

0.00%

发文量

审稿时长

1 months

期刊介绍： ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized. TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.