Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?

2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS) Pub Date : 2016-11-13 DOI:10.1109/PDSW-DISCS.2016.7

Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiaoyi Lu, D. Panda

{"title":"Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?","authors":"Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiaoyi Lu, D. Panda","doi":"10.1109/PDSW-DISCS.2016.7","DOIUrl":null,"url":null,"abstract":"Modern High-Performance Computing (HPC) clusters are equipped with advanced technological resources that need to be properly utilized to achieve supreme performance for end applications. One such example, Non-Volatile Memory (NVM), provides the opportunity for fast scalable performance through its DRAM-like performance characteristics. On the other hand, distributed processing engines, such as MapReduce, are continuously being enhanced with features enabling high-performance technologies. In this paper, we present a novel MapReduce framework with NVRAM-assisted map output spill approach. We have designed our framework on top of the existing RDMA-enhanced Hadoop MapReduce to ensure both map and reduce phase performance enhancements to be present for end applications. Our proposed approach significantly enhances map phase performance proven by a wide variety of MapReduce benchmarks and workloads from Intel HiBench [9] and PUMA [18] suites. Our performance evaluation illustrates that NVRAM-based spill approach can improve map execution performance by 2.73x which contributes to the overall execution improvement of 55% for Sort. Our design also guarantees significant performance benefits for other workloads: 54% for TeraSort, 21% for PageRank, 58% for SelfJoin, etc. To the best of our knowledge, this is the first approach towards leveraging NVRAM in MapReduce execution frameworks for applications on HPC clusters.","PeriodicalId":375550,"journal":{"name":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW-DISCS.2016.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Modern High-Performance Computing (HPC) clusters are equipped with advanced technological resources that need to be properly utilized to achieve supreme performance for end applications. One such example, Non-Volatile Memory (NVM), provides the opportunity for fast scalable performance through its DRAM-like performance characteristics. On the other hand, distributed processing engines, such as MapReduce, are continuously being enhanced with features enabling high-performance technologies. In this paper, we present a novel MapReduce framework with NVRAM-assisted map output spill approach. We have designed our framework on top of the existing RDMA-enhanced Hadoop MapReduce to ensure both map and reduce phase performance enhancements to be present for end applications. Our proposed approach significantly enhances map phase performance proven by a wide variety of MapReduce benchmarks and workloads from Intel HiBench [9] and PUMA [18] suites. Our performance evaluation illustrates that NVRAM-based spill approach can improve map execution performance by 2.73x which contributes to the overall execution improvement of 55% for Sort. Our design also guarantees significant performance benefits for other workloads: 54% for TeraSort, 21% for PageRank, 58% for SelfJoin, etc. To the best of our knowledge, this is the first approach towards leveraging NVRAM in MapReduce execution frameworks for applications on HPC clusters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非易失性内存是否有利于高性能计算集群上的MapReduce应用?

现代高性能计算(HPC)集群拥有先进的技术资源，需要合理利用这些资源，才能为终端应用提供最高的性能。例如，非易失性内存(NVM)通过其类似dram的性能特性提供了快速可扩展性能的机会。另一方面，分布式处理引擎，如MapReduce，正在不断地增强支持高性能技术的特性。在本文中，我们提出了一种新的MapReduce框架，该框架采用nvram辅助映射输出溢出方法。我们在现有的rdma增强的Hadoop MapReduce之上设计了我们的框架，以确保map和reduce阶段的性能增强在最终应用程序中呈现。我们提出的方法显著提高了地图阶段的性能，并得到了来自英特尔HiBench[9]和PUMA[18]套件的各种MapReduce基准测试和工作负载的证明。我们的性能评估表明，基于nvram的溢出方法可以将映射执行性能提高2.73倍，这使得Sort的总体执行性能提高了55%。我们的设计还保证了其他工作负载的显著性能优势:TeraSort为54%，PageRank为21%，SelfJoin为58%等。据我们所知，这是在高性能计算集群上的应用程序的MapReduce执行框架中利用NVRAM的第一种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)

自引率

0.00%

发文量