Spark Streaming在Apache YARN上的可配置和可执行模型

Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen
{"title":"Spark Streaming在Apache YARN上的可配置和可执行模型","authors":"Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen","doi":"10.1504/ijguc.2020.10026548","DOIUrl":null,"url":null,"abstract":"Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.","PeriodicalId":375871,"journal":{"name":"Int. J. Grid Util. Comput.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A configurable and executable model of Spark Streaming on Apache YARN\",\"authors\":\"Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen\",\"doi\":\"10.1504/ijguc.2020.10026548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.\",\"PeriodicalId\":375871,\"journal\":{\"name\":\"Int. J. Grid Util. Comput.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Grid Util. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijguc.2020.10026548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Grid Util. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijguc.2020.10026548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

今天,数据流以前所未有的规模产生。高效和稳定地处理这些流需要在流应用程序和底层流处理框架的参数之间进行仔细的相互作用。如今,要找到这些参数,需要在复杂的已部署框架上反复试验。本文表明,高级模型可以通过预测和比较在不同配置的流处理框架上运行的流应用程序的性能来帮助确定这些参数。为了演示这种方法,本文考虑了Spark Streaming,这是一个广泛使用的框架,用于动态利用数据流并提供实时流处理。在技术上,我们开发了一个可配置和可执行的模型来模拟流应用程序和底层Spark流处理框架。此外,我们还对Spark Streaming在Apache YARN上的部署进行了建模,Apache YARN是一个流行的大数据处理开源分布式软件框架。通过实证验证,表明所建立的模型具有较好的预测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A configurable and executable model of Spark Streaming on Apache YARN
Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Resource consumption trade-off for reducing hotspot migration in modern data centres Method for determining cloth simulation filtering threshold value based on curvature value of fitting curve An agent-based mechanism to form cloud federations and manage their requirements changes K-means clustering algorithm for data distribution in cloud computing environment FastGarble: an optimised garbled circuit construction framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1