Javier Conejero, María Blanca Caminero, C. Carrión
{"title":"在多用户IaaS云中分析Hadoop性能","authors":"Javier Conejero, María Blanca Caminero, C. Carrión","doi":"10.1109/HPCSIM.2014.6903713","DOIUrl":null,"url":null,"abstract":"Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"399-406"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Analysing Hadoop performance in a multi-user IaaS Cloud\",\"authors\":\"Javier Conejero, María Blanca Caminero, C. Carrión\",\"doi\":\"10.1109/HPCSIM.2014.6903713\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.\",\"PeriodicalId\":6469,\"journal\":{\"name\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"6 1\",\"pages\":\"399-406\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSIM.2014.6903713\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSIM.2014.6903713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysing Hadoop performance in a multi-user IaaS Cloud
Over the last few years, Big Data analysis (i.e., crunching enormous amounts of data from different sources to extract useful knowledge for improving business objectives) has attracted huge attention from enterprises and research institutions. One of the most successful paradigms that has gained popularity in order to analyse this huge amount of data, is MapReduce (and particularly Hadoop, its open source implementation). However, Hadoop-based applications require massive amounts of resources in order to conduct different analysis of large amounts of data. This growing requirements that research and enterprises demand from the actual computing infrastructures empowers the Cloud computing utilization, where there is an increasing demand of Hadoop as a Service. Since Hadoop requires a distributed environment in order to operate, a significant problem is where resources are located. Focusing in Cloud environments, this problem lays mainly on the criteria for Virtual Machine (VM) placement. The work presented in this paper focuses on the analysis of performance, power consumption and resource usage by Hadoop applications when deploying Hadoop on Virtual Clusters (VCs) within a private IaaS Cloud. More precisely, the impact of different VM placement strategies on Hadoop-based application performance, power consumption and resource usage is measured. As a result, some conclusions on the optimal criteria for VM deployment are provided.