Asgard: NoSQL数据库适用于无服务器工作负载中的临时数据吗?

Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji
{"title":"Asgard: NoSQL数据库适用于无服务器工作负载中的临时数据吗?","authors":"Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji","doi":"10.3389/fhpcp.2023.1127883","DOIUrl":null,"url":null,"abstract":"Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?\",\"authors\":\"Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji\",\"doi\":\"10.3389/fhpcp.2023.1127883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.\",\"PeriodicalId\":399190,\"journal\":{\"name\":\"Frontiers in High Performance Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fhpcp.2023.1127883\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fhpcp.2023.1127883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

无服务器计算平台由于其低管理开销和细粒度计费策略,在数据分析应用程序中越来越受欢迎。这种分析框架使用有向无环图(DAG)结构,其中无服务器功能(细粒度任务)表示为节点,功能之间的数据依赖关系表示为边。将中间(短暂的)数据从一个函数传递到另一个函数最近受到了人们的关注,人们提出了各种存储系统和优化方法。实践状态的方法是通过远程存储传递临时数据,要么基于磁盘(例如,Amazon S3),这是缓慢的,要么基于内存(例如,ElastiCache Redis),这是昂贵的。尽管一些突出的NoSQL数据库(如Apache Cassandra和ScyllaDB)具有利用内存和磁盘的潜力,但普遍的观点认为它们不适合临时数据,更适合长期存储。在我们名为《阿斯加德》的研究中,我们严格检验了这一假设。我们使用Amazon Web Services (AWS)作为两个流行的无服务器应用程序的测试平台,探讨了fanout和不同工作负载等场景,以dag感知方式配置NoSQL数据库的性能优势。令人惊讶的是,我们发现,按$ cost标准化的端到端延迟,Apache Cassandra的默认设置比Redis高326%,比S3高189%。当使用Asgard进行优化时,Cassandra比自己的默认配置高出47%。这强调了NoSQL数据库可以超越当前实践状态的特定实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?
Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Runtime support for CPU-GPU high-performance computing on distributed memory platforms Using open-science workflow tools to produce SCEC CyberShake physics-based probabilistic seismic hazard models The fast and the capacious: memory-efficient multi-GPU accelerated explicit state space exploration with GPUexplore 3.0 Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads? SNDVI: a new scalable serverless framework to compute NDVI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1