{"title":"Enhancing Hadoop Performance in Homogenous and Heterogeneous Big Data Environments by Dynamic Slot Configuration","authors":"E. Hamza","doi":"10.25728/ASSA.2020.20.1.761","DOIUrl":null,"url":null,"abstract":"Hadoop is one of the most famous platform solutions for processing large volume and scale of data in parallel processing in Cloud computing. A Hadoop system can be characterized based on three main factors: cluster, workload and user. Each of these factors can be described as either heterogeneous or homogenous, which reflects the heterogeneity degree of the Hadoop systemThe objective of this proposed research work is to investigate the degree of influence of heterogeneity for each of these factors on the performance of Hadoop based on different schedulers. Three schedulers are considered with different levels of Hadoop heterogeneity and are tested and analyzed: the first algorithm considered is the FIFO (First in First out), the second is the Fair sharing, and the final is the COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are related to Hadoop schedulers and comparative performance analysis between different cases of jobs submission. These jobs are processed in different homogenous or heterogeneous data environments and under fixed or reconfigurable slot between map and reduce tasks for Hadoop MapReduce java programming clustering model. The results showed that when assigning tunable knob between map and reduce tasks under certain schedulers like FIFO algorithm, the performance enhanced significantly especially in cases of heterogeneity environment where the workload decreased significantly and the utilization of computational resources increase was obvious.","PeriodicalId":39095,"journal":{"name":"Advances in Systems Science and Applications","volume":"20 1","pages":"13-26"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Systems Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25728/ASSA.2020.20.1.761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
Hadoop is one of the most famous platform solutions for processing large volume and scale of data in parallel processing in Cloud computing. A Hadoop system can be characterized based on three main factors: cluster, workload and user. Each of these factors can be described as either heterogeneous or homogenous, which reflects the heterogeneity degree of the Hadoop systemThe objective of this proposed research work is to investigate the degree of influence of heterogeneity for each of these factors on the performance of Hadoop based on different schedulers. Three schedulers are considered with different levels of Hadoop heterogeneity and are tested and analyzed: the first algorithm considered is the FIFO (First in First out), the second is the Fair sharing, and the final is the COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are related to Hadoop schedulers and comparative performance analysis between different cases of jobs submission. These jobs are processed in different homogenous or heterogeneous data environments and under fixed or reconfigurable slot between map and reduce tasks for Hadoop MapReduce java programming clustering model. The results showed that when assigning tunable knob between map and reduce tasks under certain schedulers like FIFO algorithm, the performance enhanced significantly especially in cases of heterogeneity environment where the workload decreased significantly and the utilization of computational resources increase was obvious.
期刊介绍:
Advances in Systems Science and Applications (ASSA) is an international peer-reviewed open-source online academic journal. Its scope covers all major aspects of systems (and processes) analysis, modeling, simulation, and control, ranging from theoretical and methodological developments to a large variety of application areas. Survey articles and innovative results are also welcome. ASSA is aimed at the audience of scientists, engineers and researchers working in the framework of these problems. ASSA should be a platform on which researchers will be able to communicate and discuss both their specialized issues and interdisciplinary problems of systems analysis and its applications in science and industry, including data science, artificial intelligence, material science, manufacturing, transportation, power and energy, ecology, corporate management, public governance, finance, and many others.