Amoeba: aligning stream processing operators with externally-managed state

Antonis Papaioannou, K. Magoutis
{"title":"Amoeba: aligning stream processing operators with externally-managed state","authors":"Antonis Papaioannou, K. Magoutis","doi":"10.1145/3468737.3494096","DOIUrl":null,"url":null,"abstract":"Scalable stream processing systems (SPS) often require external storage systems for long-term storage of non-emphemeral state. Such state cannot be accommodated in the internal stores of SPSes that are mainly geared for fault tolerance of streaming jobs, lack externally visible APIs, and their state is disposed of at the end of such jobs. Recent research have pointed to scalable in-memory key-value stores (KVS) as an efficient solution to manage external state. While such data stores have been interconnected with scalable streaming systems, they are currently managed independently, missing opportunities for optimizations, such as exploiting locality between stream partitions and table shards, as well as coordinating elasticity actions. Both processing and data management systems are typically designed for scalability, however coordination between them poses a significant challenge. In this work we describe Amoeba, a system that dynamically adapts data-partitioning schemes and/or task or data placement across systems to eliminate unnecessary network communication across nodes. Our evaluation using state-of-the art systems, such as the Flink SPS and Redis KVS, demonstrated 2.6x performance improvement when aligning SPS tasks with KVS shards in AWS deployments of up to 64 nodes.","PeriodicalId":254382,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468737.3494096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Scalable stream processing systems (SPS) often require external storage systems for long-term storage of non-emphemeral state. Such state cannot be accommodated in the internal stores of SPSes that are mainly geared for fault tolerance of streaming jobs, lack externally visible APIs, and their state is disposed of at the end of such jobs. Recent research have pointed to scalable in-memory key-value stores (KVS) as an efficient solution to manage external state. While such data stores have been interconnected with scalable streaming systems, they are currently managed independently, missing opportunities for optimizations, such as exploiting locality between stream partitions and table shards, as well as coordinating elasticity actions. Both processing and data management systems are typically designed for scalability, however coordination between them poses a significant challenge. In this work we describe Amoeba, a system that dynamically adapts data-partitioning schemes and/or task or data placement across systems to eliminate unnecessary network communication across nodes. Our evaluation using state-of-the art systems, such as the Flink SPS and Redis KVS, demonstrated 2.6x performance improvement when aligning SPS tasks with KVS shards in AWS deployments of up to 64 nodes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
变形虫:将流处理操作符与外部管理的状态对齐
可扩展流处理系统(SPS)通常需要外部存储系统来长期存储非瞬时状态。这种状态不能在主要用于流作业容错的spe的内部存储中容纳,缺乏外部可见的api,并且在此类作业结束时处理它们的状态。最近的研究指出,可扩展的内存中的键值存储(KVS)是管理外部状态的有效解决方案。虽然这样的数据存储已经与可扩展的流系统互联,但它们目前是独立管理的,错过了优化的机会,比如利用流分区和表分片之间的局部性,以及协调弹性操作。处理系统和数据管理系统通常都是为可伸缩性而设计的,但是它们之间的协调构成了一个重大挑战。在这项工作中,我们描述了Amoeba,一个动态适应数据分区方案和/或跨系统的任务或数据放置的系统,以消除节点之间不必要的网络通信。我们使用最先进的系统(如Flink SPS和Redis KVS)进行评估,在多达64个节点的AWS部署中将SPS任务与KVS分片对齐时,性能提高了2.6倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributed federated service chaining for heterogeneous network environments Accord RDS Leveraging vCPU-utilization rates to select cost-efficient VMs for parallel workloads Multi-cloud serverless function composition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1