Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds

Weiyi Shang, Z. Jiang, H. Hemmati, Bram Adams, A. Hassan, Patrick Martin
{"title":"Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds","authors":"Weiyi Shang, Z. Jiang, H. Hemmati, Bram Adams, A. Hassan, Patrick Martin","doi":"10.1109/ICSE.2013.6606586","DOIUrl":null,"url":null,"abstract":"Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.","PeriodicalId":322423,"journal":{"name":"2013 35th International Conference on Software Engineering (ICSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"165","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 35th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2013.6606586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 165

Abstract

Big data analytics is the process of examining large amounts of data (big data) in an effort to uncover hidden patterns or unknown correlations. Big Data Analytics Applications (BDA Apps) are a new type of software applications, which analyze big data using massive parallel processing frameworks (e.g., Hadoop). Developers of such applications typically develop them using a small sample of data in a pseudo-cloud environment. Afterwards, they deploy the applications in a large-scale cloud environment with considerably more processing power and larger input data (reminiscent of the mainframe days). Working with BDA App developers in industry over the past three years, we noticed that the runtime analysis and debugging of such applications in the deployment phase cannot be easily addressed by traditional monitoring and debugging approaches. In this paper, as a first step in assisting developers of BDA Apps for cloud deployments, we propose a lightweight approach for uncovering differences between pseudo and large-scale cloud deployments. Our approach makes use of the readily-available yet rarely used execution logs from these platforms. Our approach abstracts the execution logs, recovers the execution sequences, and compares the sequences between the pseudo and cloud deployments. Through a case study on three representative Hadoop-based BDA Apps, we show that our approach can rapidly direct the attention of BDA App developers to the major differences between the two deployments. Knowledge of such differences is essential in verifying BDA Apps when analyzing big data in the cloud. Using injected deployment faults, we show that our approach not only significantly reduces the deployment verification effort, but also provides very few false positives when identifying deployment failures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
协助大数据分析应用开发人员在Hadoop云上部署
大数据分析是检查大量数据(大数据)以发现隐藏模式或未知相关性的过程。大数据分析应用程序(BDA Apps)是一种新型的软件应用程序,它使用大规模并行处理框架(如Hadoop)来分析大数据。此类应用程序的开发人员通常使用伪云环境中的小样本数据来开发它们。然后,他们将应用程序部署到具有更强处理能力和更大输入数据的大规模云环境中(让人想起大型机时代)。在过去的三年中,我们与行业中的BDA应用程序开发人员合作,注意到在部署阶段对此类应用程序的运行时分析和调试不能通过传统的监控和调试方法轻松解决。在本文中,作为帮助BDA应用程序的云部署开发人员的第一步,我们提出了一种轻量级方法来揭示伪云部署和大规模云部署之间的差异。我们的方法利用了这些平台上容易获得但很少使用的执行日志。我们的方法抽象执行日志,恢复执行序列,并比较伪部署和云部署之间的序列。通过对三个具有代表性的基于hadoop的BDA应用程序的案例研究,我们展示了我们的方法可以迅速将BDA应用程序开发人员的注意力引导到两种部署之间的主要差异上。在分析云中的大数据时,了解这些差异对于验证BDA应用程序至关重要。使用注入的部署错误,我们表明我们的方法不仅显著地减少了部署验证工作,而且在识别部署失败时提供了很少的误报。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Studios in software engineering education: Towards an evaluable model Not going to take this anymore: Multi-objective overtime planning for Software Engineering projects 3rd International workshop on collaborative teaching of globally distributed software development (CTGDSD 2013) TestEvol: A tool for analyzing test-suite evolution A characteristic study on failures of production distributed data-parallel programs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1