基于内存计算特性的spark自适应调优策略

Yao Zhao, Fei Hu, Hao-peng Chen
{"title":"基于内存计算特性的spark自适应调优策略","authors":"Yao Zhao, Fei Hu, Hao-peng Chen","doi":"10.1109/ICACT.2016.7423442","DOIUrl":null,"url":null,"abstract":"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.","PeriodicalId":125854,"journal":{"name":"2016 18th International Conference on Advanced Communication Technology (ICACT)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"An adaptive tuning strategy on spark based on in-memory computation characteristics\",\"authors\":\"Yao Zhao, Fei Hu, Hao-peng Chen\",\"doi\":\"10.1109/ICACT.2016.7423442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.\",\"PeriodicalId\":125854,\"journal\":{\"name\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"volume\":\"34 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Conference on Advanced Communication Technology (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2016.7423442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2016.7423442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

我们提出了一种自适应调优方法来提高Spark的性能,特别是它的内存计算。这种方式有一个目的:通过根据不同工作集中的Spark应用程序运行时统计数据自适应地采用合适的类别,从而更好地合理使用内存。这个解决方案分为两个步骤。首先,它动态地收集运行时统计信息,并将它们存储在循环结构中以节省内存。其次,可以根据这些统计数据来更改系统存储类别。此外,我们专注于序列化策略的优化。为此,我们测试了Spark集成的序列化算法:Java和Kryo序列化算法,并对它们的性能进行了比较。为了获得灵活性,我们通过将默认的序列化单元从一个RDD设置为一个块来改变Spark序列化机制。这样,对于RDD具有大量块的情况,我们的解决方案可以使用不同的序列化算法来序列化一个RDD中的不同块。结果表明,在内存资源不足的情况下,我们的解决方案具有足够的表现力,可以获得比原始Spark 2倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An adaptive tuning strategy on spark based on in-memory computation characteristics
We present an adaptive tuning method to improve Spark performance, especially for its in-memory computation. This manner serves one purpose: making a better use of memory reasonably through adaptively adopting suitable category based on Spark application runtime statistics on different working sets. This solution works in two steps. Firstly, it collects run-time statistics dynamically and stores them in round-robin structures to save memory. Secondly, it can change system storage category based on these statistics. Additionally we focus on serialization strategy optimization. For this purpose we test Spark integrated serialization algorithms: Java and Kryo serialization algorithms, and make a comparison of their performance. In order to gain flexibility we change Spark serialization mechanism by setting the default serialization unit from one RDD to one block. In this way, for the case which RDD has huge amount of blocks our solution can use different serialization algorithms to serialize different blocks in one RDD. We show that our solution is expressive enough to obtain 2x speedup than original Spark when there is inadequate memory resource.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DNSNA: DNS name autoconfiguration for Internet of Things devices A novel multi-carrier waveform with high spectral efficiency: Semi-orthogonal frequency division multiplexing Adaptive spectral co-clustering for multiview data Efficient Doppler mitigation for high-speed rail communications Supply and demand management system based on consumption pattern analysis and tariff for cost minimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1