基于内存计算特性的spark自适应调优策略

2016 18th International Conference on Advanced Communication Technology (ICACT) Pub Date : 1900-01-01 DOI:10.1109/ICACT.2016.7423442

Yao Zhao, Fei Hu, Hao-peng Chen

{"title":"基于内存计算特性的spark自适应调优策略","authors":"Yao Zhao, Fei Hu, Hao-peng Chen","doi":"10.1109/ICACT.2016.7423442","DOIUrl":null,"url":null,"abstract":"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.","PeriodicalId":125854,"journal":{"name":"2016 18th International Conference on Advanced Communication Technology (ICACT)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"An adaptive tuning strategy on spark based on in-memory computation characteristics\",\"authors\":\"Yao Zhao, Fei Hu, Hao-peng Chen\",\"doi\":\"10.1109/ICACT.2016.7423442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.\",\"PeriodicalId\":125854,\"journal\":{\"name\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"volume\":\"34 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2016.7423442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2016.7423442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

我们提出了一种自适应调优方法来提高Spark的性能，特别是它的内存计算。这种方式有一个目的:通过根据不同工作集中的Spark应用程序运行时统计数据自适应地采用合适的类别，从而更好地合理使用内存。这个解决方案分为两个步骤。首先，它动态地收集运行时统计信息，并将它们存储在循环结构中以节省内存。其次，可以根据这些统计数据来更改系统存储类别。此外，我们专注于序列化策略的优化。为此，我们测试了Spark集成的序列化算法:Java和Kryo序列化算法，并对它们的性能进行了比较。为了获得灵活性，我们通过将默认的序列化单元从一个RDD设置为一个块来改变Spark序列化机制。这样，对于RDD具有大量块的情况，我们的解决方案可以使用不同的序列化算法来序列化一个RDD中的不同块。结果表明，在内存资源不足的情况下，我们的解决方案具有足够的表现力，可以获得比原始Spark 2倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An adaptive tuning strategy on spark based on in-memory computation characteristics

We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 18th International Conference on Advanced Communication Technology (ICACT)

自引率

0.00%

发文量