{"title":"Model to estimate the size of a Hadoop cluster - HCEm","authors":"J. Brito, Aleteia P. F. Araujo","doi":"10.1109/PADSW.2014.7097897","DOIUrl":null,"url":null,"abstract":"This paper describes a model which aims to estimate the size of a cluster running Hadoop framework for the processing of large datasets at a given timeframe. As main contributions it denes (i) a light layer of optimization for MapReduce jobs, (ii) presents a model to estimate the size cluster for a Hadoop framework and (iii) performs tests using a real environment - the Amazon Elastic MapReduce. The proposed approach works with the MapReduce to dene the main configuration parameters and determines computational resources of hosts in the cluster in order to meet the desired runtime for the requirements of a given workload requirement. Thus, the results show that the proposed model is able to avoid to over-allocation or sub-allocation of computing resources on a Hadoop cluster.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper describes a model which aims to estimate the size of a cluster running Hadoop framework for the processing of large datasets at a given timeframe. As main contributions it denes (i) a light layer of optimization for MapReduce jobs, (ii) presents a model to estimate the size cluster for a Hadoop framework and (iii) performs tests using a real environment - the Amazon Elastic MapReduce. The proposed approach works with the MapReduce to dene the main configuration parameters and determines computational resources of hosts in the cluster in order to meet the desired runtime for the requirements of a given workload requirement. Thus, the results show that the proposed model is able to avoid to over-allocation or sub-allocation of computing resources on a Hadoop cluster.