hpc -重用:在超级计算机上运行MPI和Hadoop MapReduce的高效进程创建

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.72

Thanh-Chung Dao, S. Chiba

{"title":"hpc -重用:在超级计算机上运行MPI和Hadoop MapReduce的高效进程创建","authors":"Thanh-Chung Dao, S. Chiba","doi":"10.1109/CCGrid.2016.72","DOIUrl":null,"url":null,"abstract":"Hadoop and Spark analytics are used widely for large-scale data processing on commodity clusters. It is better choice to run them on supercomputers in aspects of productivity and maturity rather than developing new frameworks from scratch. YARN, a key component of Hadoop, is responsible for resource management. YARN adopts dynamic management for job execution and scheduling. We identify three Ds (3D) dynamic characteristics from YARN-like management: on-Demand (processes created during job execution), Diverse job, and Detailed (fine-grained allocation). The dynamic management does not fit into typical resource managers on supercomputers, for example PBS, that are identified having three Ss (3S) static characteristics: Stationary (no newly created process during execution), Single job, and Shallow (coarse-grained allocation). In this paper, we propose HPC-Reuse located between YARN-like and PBS-like resource managers in order to provide better support of dynamic management. HPC-Reuse helps avoid process creation, such as MPI-Spawn, and enable MPI communication over Hadoop processes. Our experimental results show that HPC-Reuse can reduce execution time of iterative PageRank by 26%.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"HPC-Reuse: Efficient Process Creation for Running MPI and Hadoop MapReduce on Supercomputers\",\"authors\":\"Thanh-Chung Dao, S. Chiba\",\"doi\":\"10.1109/CCGrid.2016.72\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop and Spark analytics are used widely for large-scale data processing on commodity clusters. It is better choice to run them on supercomputers in aspects of productivity and maturity rather than developing new frameworks from scratch. YARN, a key component of Hadoop, is responsible for resource management. YARN adopts dynamic management for job execution and scheduling. We identify three Ds (3D) dynamic characteristics from YARN-like management: on-Demand (processes created during job execution), Diverse job, and Detailed (fine-grained allocation). The dynamic management does not fit into typical resource managers on supercomputers, for example PBS, that are identified having three Ss (3S) static characteristics: Stationary (no newly created process during execution), Single job, and Shallow (coarse-grained allocation). In this paper, we propose HPC-Reuse located between YARN-like and PBS-like resource managers in order to provide better support of dynamic management. HPC-Reuse helps avoid process creation, such as MPI-Spawn, and enable MPI communication over Hadoop processes. Our experimental results show that HPC-Reuse can reduce execution time of iterative PageRank by 26%.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.72\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

Hadoop和Spark分析被广泛用于商品集群上的大规模数据处理。在生产力和成熟度方面，在超级计算机上运行它们比从头开始开发新框架更好。YARN是Hadoop的一个关键组件，负责资源管理。YARN对作业的执行和调度采用动态管理。我们从类似yarn的管理中识别出三个d (3D)动态特征:按需(作业执行期间创建的流程)、多样化作业和详细(细粒度分配)。动态管理不适用于超级计算机(例如PBS)上的典型资源管理器，这些超级计算机具有三个s (3S)静态特征:Stationary(在执行期间没有新创建的进程)、Single job和Shallow(粗粒度分配)。为了更好地支持动态管理，本文提出了HPC-Reuse，它位于类yarn资源管理器和类pbs资源管理器之间。hpc -重用有助于避免进程创建，例如MPI- spawn，并支持在Hadoop进程上进行MPI通信。实验结果表明，HPC-Reuse可将迭代PageRank的执行时间缩短26%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HPC-Reuse: Efficient Process Creation for Running MPI and Hadoop MapReduce on Supercomputers

Hadoop and Spark analytics are used widely for large-scale data processing on commodity clusters. It is better choice to run them on supercomputers in aspects of productivity and maturity rather than developing new frameworks from scratch. YARN, a key component of Hadoop, is responsible for resource management. YARN adopts dynamic management for job execution and scheduling. We identify three Ds (3D) dynamic characteristics from YARN-like management: on-Demand (processes created during job execution), Diverse job, and Detailed (fine-grained allocation). The dynamic management does not fit into typical resource managers on supercomputers, for example PBS, that are identified having three Ss (3S) static characteristics: Stationary (no newly created process during execution), Single job, and Shallow (coarse-grained allocation). In this paper, we propose HPC-Reuse located between YARN-like and PBS-like resource managers in order to provide better support of dynamic management. HPC-Reuse helps avoid process creation, such as MPI-Spawn, and enable MPI communication over Hadoop processes. Our experimental results show that HPC-Reuse can reduce execution time of iterative PageRank by 26%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量