{"title":"基于内存计算特性的spark自适应调优策略","authors":"Yao Zhao, Fei Hu, Hao-peng Chen","doi":"10.1109/ICACT.2016.7423442","DOIUrl":null,"url":null,"abstract":"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.","PeriodicalId":125854,"journal":{"name":"2016 18th International Conference on Advanced Communication Technology (ICACT)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"An adaptive tuning strategy on spark based on in-memory computation characteristics\",\"authors\":\"Yao Zhao, Fei Hu, Hao-peng Chen\",\"doi\":\"10.1109/ICACT.2016.7423442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.\",\"PeriodicalId\":125854,\"journal\":{\"name\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"volume\":\"34 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2016.7423442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2016.7423442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An adaptive tuning strategy on spark based on in-memory computation characteristics
We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.