Failure Transparency in Stateful Dataflow Systems (Technical Report)

Aleksey VeresovKTH Royal Institute of Technology, Jonas SpengerKTH Royal Institute of Technology, Paris CarboneKTH Royal Institute of TechnologyRISE Research Institutes of Sweden, Philipp HallerKTH Royal Institute of Technology
{"title":"Failure Transparency in Stateful Dataflow Systems (Technical Report)","authors":"Aleksey VeresovKTH Royal Institute of Technology, Jonas SpengerKTH Royal Institute of Technology, Paris CarboneKTH Royal Institute of TechnologyRISE Research Institutes of Sweden, Philipp HallerKTH Royal Institute of Technology","doi":"arxiv-2407.06738","DOIUrl":null,"url":null,"abstract":"Failure transparency enables users to reason about distributed systems at a\nhigher level of abstraction, where complex failure-handling logic is hidden.\nThis is especially true for stateful dataflow systems, which are the backbone\nof many cloud applications. In particular, this paper focuses on proving\nfailure transparency in Apache Flink, a popular stateful dataflow system. Even\nthough failure transparency is a critical aspect of Apache Flink, to date it\nhas not been formally proven. Showing that the failure transparency mechanism\nis correct, however, is challenging due to the complexity of the mechanism\nitself. Nevertheless, this complexity can be effectively hidden behind a\nfailure transparent programming interface. To show that Apache Flink is failure\ntransparent, we model it in small-step operational semantics. Next, we provide\na novel definition of failure transparency based on observational\nexplainability, a concept which relates executions according to their\nobservations. Finally, we provide a formal proof of failure transparency for\nthe implementation model; i.e., we prove that the failure-free model correctly\nabstracts from the failure-related details of the implementation model. We also\nshow liveness of the implementation model under a fair execution assumption.\nThese results are a first step towards a verified stack for stateful dataflow\nsystems.","PeriodicalId":501197,"journal":{"name":"arXiv - CS - Programming Languages","volume":"147 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Failure transparency enables users to reason about distributed systems at a higher level of abstraction, where complex failure-handling logic is hidden. This is especially true for stateful dataflow systems, which are the backbone of many cloud applications. In particular, this paper focuses on proving failure transparency in Apache Flink, a popular stateful dataflow system. Even though failure transparency is a critical aspect of Apache Flink, to date it has not been formally proven. Showing that the failure transparency mechanism is correct, however, is challenging due to the complexity of the mechanism itself. Nevertheless, this complexity can be effectively hidden behind a failure transparent programming interface. To show that Apache Flink is failure transparent, we model it in small-step operational semantics. Next, we provide a novel definition of failure transparency based on observational explainability, a concept which relates executions according to their observations. Finally, we provide a formal proof of failure transparency for the implementation model; i.e., we prove that the failure-free model correctly abstracts from the failure-related details of the implementation model. We also show liveness of the implementation model under a fair execution assumption. These results are a first step towards a verified stack for stateful dataflow systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
有状态数据流系统中的故障透明度(技术报告)
故障透明度使用户能够在更高的抽象层次上对分布式系统进行推理,而复杂的故障处理逻辑则被隐藏起来。本文尤其关注在流行的有状态数据流系统 Apache Flink 中证明故障透明度。尽管故障透明度是 Apache Flink 的一个关键方面,但迄今为止它尚未得到正式证明。然而,由于故障透明机制本身的复杂性,证明该机制的正确性具有挑战性。不过,这种复杂性可以有效地隐藏在故障透明的编程接口之后。为了证明 Apache Flink 是故障透明的,我们用小步运算语义对其进行了建模。接下来,我们提供了基于可观察性解释性的故障透明新定义,这是一个根据观察结果将执行联系起来的概念。最后,我们为实现模型提供了故障透明度的形式化证明;也就是说,我们证明了无故障模型正确抽象了实现模型中与故障相关的细节。我们还展示了在公平执行假设下实现模型的有效性。这些结果是迈向有状态数据流系统可验证堆栈的第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Memory Consistency and Program Transformations No Saved Kaleidosope: an 100% Jitted Neural Network Coding Language with Pythonic Syntax Towards Quantum Multiparty Session Types The Incredible Shrinking Context... in a decompiler near you Scheme Pearl: Quantum Continuations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1