J. Kwak, Eunji Hwang, Tae-kyung Yoo, Beomseok Nam, Young-ri Choi
{"title":"Hadoop的内存缓存编排","authors":"J. Kwak, Eunji Hwang, Tae-kyung Yoo, Beomseok Nam, Young-ri Choi","doi":"10.1109/CCGrid.2016.73","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate techniques to effectively orchestrate HDFS in-memory caching for Hadoop. We first evaluate a degree of benefit which each of various MapReduce applications can get from in-memory caching, i.e. cache affinity. We then propose an adaptive cache local scheduling algorithm that adaptively adjusts the waiting time of a MapReduce job in a queue for a cache local node. We set the waiting time to be proportional to the percentage of cached input data for the job. We also develop a cache affinity cache replacement algorithm that determines which block is cached and evicted based on the cache affinity of applications. Using various workloads consisting of multiple MapReduce applications, we conduct experimental study to demonstrate the effects of the proposed in-memory orchestration techniques. Our experimental results show that our enhanced Hadoop in-memory caching scheme improves the performance of the MapReduce workloads up to 18% and 10% against Hadoop that disables and enables HDFS in-memory caching, respectively.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"In-Memory Caching Orchestration for Hadoop\",\"authors\":\"J. Kwak, Eunji Hwang, Tae-kyung Yoo, Beomseok Nam, Young-ri Choi\",\"doi\":\"10.1109/CCGrid.2016.73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate techniques to effectively orchestrate HDFS in-memory caching for Hadoop. We first evaluate a degree of benefit which each of various MapReduce applications can get from in-memory caching, i.e. cache affinity. We then propose an adaptive cache local scheduling algorithm that adaptively adjusts the waiting time of a MapReduce job in a queue for a cache local node. We set the waiting time to be proportional to the percentage of cached input data for the job. We also develop a cache affinity cache replacement algorithm that determines which block is cached and evicted based on the cache affinity of applications. Using various workloads consisting of multiple MapReduce applications, we conduct experimental study to demonstrate the effects of the proposed in-memory orchestration techniques. Our experimental results show that our enhanced Hadoop in-memory caching scheme improves the performance of the MapReduce workloads up to 18% and 10% against Hadoop that disables and enables HDFS in-memory caching, respectively.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.73\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we investigate techniques to effectively orchestrate HDFS in-memory caching for Hadoop. We first evaluate a degree of benefit which each of various MapReduce applications can get from in-memory caching, i.e. cache affinity. We then propose an adaptive cache local scheduling algorithm that adaptively adjusts the waiting time of a MapReduce job in a queue for a cache local node. We set the waiting time to be proportional to the percentage of cached input data for the job. We also develop a cache affinity cache replacement algorithm that determines which block is cached and evicted based on the cache affinity of applications. Using various workloads consisting of multiple MapReduce applications, we conduct experimental study to demonstrate the effects of the proposed in-memory orchestration techniques. Our experimental results show that our enhanced Hadoop in-memory caching scheme improves the performance of the MapReduce workloads up to 18% and 10% against Hadoop that disables and enables HDFS in-memory caching, respectively.