BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Proceedings - International Conference on Software Engineering. International Conference on Software Engineering Pub Date : 2016-05-01 DOI:10.1145/2884781.2884813

Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd Millstein, Miryung Kim

{"title":"BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.","authors":"Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd Millstein, Miryung Kim","doi":"10.1145/2884781.2884813","DOIUrl":null,"url":null,"abstract":"<p><p>Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.</p>","PeriodicalId":91595,"journal":{"name":"Proceedings - International Conference on Software Engineering. International Conference on Software Engineering","volume":"2016 ","pages":"784-795"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2884781.2884813","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings - International Conference on Software Engineering. International Conference on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2884781.2884813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 78

Abstract

Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BigDebug:在Spark中调试交互大数据处理的原语。

开发人员在开发大数据分析时，使用云计算平台并行处理大量数据。调试在当今数据中心中运行的大量并行计算既耗时又容易出错。为了应对这一挑战，我们设计了一组交互式的实时调试原语，用于在下一代数据密集型可扩展云计算平台Apache Spark中处理大数据。这需要重新考虑在传统调试器(如gdb)中逐步调试的概念，因为跨分布式工作节点暂停整个计算会导致严重的延迟，并且对于最终用户来说，使用观察点天真地检查数百万条记录太费时了。首先，BIGDEBUG的模拟断点和按需观察点允许用户选择性地检查云上分布的中间数据，开销很小。其次，用户还可以精确定位导致崩溃的记录，并在快速修复后选择性地恢复相关的子计算。第三，用户可以通过细粒度的数据来源功能在单个记录级别确定错误(或延迟)的根本原因。我们的评估表明，BIGDEBUG的规模可以达到tb级，其记录级跟踪的开销平均不到25%。它可以更准确地确定崩溃的罪魁祸首，并且与基准重播调试器相比，可以节省高达100%的时间。结果表明，BIGDEBUG支持以交互速度调试，对性能的影响最小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊