首页 > 最新文献

Proceedings of the 2017 Symposium on Cloud Computing最新文献

英文 中文
ALOHA-KV: high performance read-only and write-only distributed transactions ALOHA-KV:高性能只读和只写分布式事务
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127487
Hua Fan, W. Golab, C. B. Morrey
There is a trend in recent database research to pursue coordination avoidance and weaker transaction isolation under a long-standing assumption: concurrent serializable transactions under read-write or write-write conflicts require costly synchronization, and thus may incur a steep price in terms of performance. In particular, distributed transactions, which access multiple data items atomically, are considered inherently costly. They require concurrency control for transaction isolation since both read-write and write-write conflicts are possible, and they rely on distributed commitment protocols to ensure atomicity in the presence of failures. This paper presents serializable read-only and write-only distributed transactions as a counterexample to show that concurrent transactions can be processed in parallel with low-overhead despite conflicts. Inspired by the slotted ALOHA network protocol, we propose a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention. Our design is centered around an epoch-based concurrency control (ECC) mechanism that minimizes synchronization conflicts and uses a small number of additional messages whose cost is amortized across many transactions. We integrate this protocol into ALOHA-KV, a scalable distributed key-value store for read-only write-only transactions, and demonstrate that the system can process close to 15 million read/write operations per second per server when each transaction batches together thousands of such operations.
在最近的数据库研究中,有一种趋势是在一个长期存在的假设下追求避免协调和较弱的事务隔离:读写或写写冲突下并发的可序列化事务需要昂贵的同步,因此可能会在性能方面付出高昂的代价。特别是,自动访问多个数据项的分布式事务被认为是代价高昂的。它们需要对事务隔离进行并发控制,因为读写冲突和写写冲突都是可能的,并且它们依赖于分布式承诺协议来确保出现故障时的原子性。本文以可序列化的只读和只写分布式事务作为反例,说明尽管存在冲突,但并发事务可以以低开销并行处理。受开槽ALOHA网络协议的启发,我们为可串行化的只读只读事务提出了一个更简单、更精简的协议,它只使用一次往返来提交事务,在没有故障的情况下,无论是否存在争用。我们的设计以基于时代的并发控制(ECC)机制为中心,该机制可以最大限度地减少同步冲突,并使用少量的额外消息,其成本在许多事务中分摊。我们将该协议集成到ALOHA-KV中,ALOHA-KV是一个可扩展的分布式键值存储,用于只读只读事务,并证明当每个事务将数千个这样的操作批处理在一起时,系统可以在每个服务器上每秒处理近1500万个读/写操作。
{"title":"ALOHA-KV: high performance read-only and write-only distributed transactions","authors":"Hua Fan, W. Golab, C. B. Morrey","doi":"10.1145/3127479.3127487","DOIUrl":"https://doi.org/10.1145/3127479.3127487","url":null,"abstract":"There is a trend in recent database research to pursue coordination avoidance and weaker transaction isolation under a long-standing assumption: concurrent serializable transactions under read-write or write-write conflicts require costly synchronization, and thus may incur a steep price in terms of performance. In particular, distributed transactions, which access multiple data items atomically, are considered inherently costly. They require concurrency control for transaction isolation since both read-write and write-write conflicts are possible, and they rely on distributed commitment protocols to ensure atomicity in the presence of failures. This paper presents serializable read-only and write-only distributed transactions as a counterexample to show that concurrent transactions can be processed in parallel with low-overhead despite conflicts. Inspired by the slotted ALOHA network protocol, we propose a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention. Our design is centered around an epoch-based concurrency control (ECC) mechanism that minimizes synchronization conflicts and uses a small number of additional messages whose cost is amortized across many transactions. We integrate this protocol into ALOHA-KV, a scalable distributed key-value store for read-only write-only transactions, and demonstrate that the system can process close to 15 million read/write operations per second per server when each transaction batches together thousands of such operations.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85148008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Resilient cloud in dynamic resource environments 动态资源环境中的弹性云
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132571
Fan Yang, A. Chien, Haryadi S. Gunawi
Traditional cloud stacks are designed to tolerate random, small-scale failures, and can successfully deliver highly-available cloud services and interactive services to end users. However, they fail to survive large-scale disruptions that are caused by major power outage, cyber-attack, or region/zone failures. Such changes trigger cascading failures and significant service outages. We propose to understand the reasons for these failures, and create reliable data services that can efficiently and robustly tolerate such large-scale resource changes. We believe cloud services will need to survive frequent, large dynamic resource changes in the future to be highly available. (1) Significant new challenges to cloud reliability are emerging, including cyber-attacks, power/network outages, and so on. For example, human error disrupted Amazon S3 service on 02/28/17 [2]. Recently hackers are even attacking electric utilities, which may lead to more outages [3, 6]. (2) Increased attention on resource cost optimization will increase usage dynamism, such as Amazon Spot Instances [1]. (3) Availability focused cloud applications will increasingly practice continuous testing to ensure they have no hidden source of catastrophic failure. For example, Netflix Simian Army can simulate the outages of individual servers, and even an entire AWS region [4]. (4) Cloud applications with dynamic flexibility will reap numerous benefits, such as flexible deployments, managing cost arbitrage and reliability arbitrage across cloud provides and datacenters, etc. Using Apache Cassandra [5] as the model system, we characterize its failure behavior under dynamic datacenter-scale resource changes. Each datacenter is volatile and randomly shut down with a given duty factor. We simulate read-only workload on a quorum-based system deployed across multiple datacenters, varying (1) system scale, (2) the fraction of volatile datacenters, and (3) the duty factor of volatile datacenters. We explore the space of various configurations, including replication factors and consistency levels, and measure the service availability (% of succeeded requests) and replication overhead (number of total replicas). Our results show that, in a volatile resource environment, the current replication and quorum protocols in Cassandra-like systems cannot high availability and consistency with low replication overhead. Our contributions include: (1) Detailed characterization of failures under dynamic datacenter-scale resource changes, showing that the exiting protocols in quorum-based systems cannot achieve high availability and consistency with low replication cost. (2) Study of the best achieve-able availability of data service in dynamic datacenter-scale resource environment.
传统的云堆栈被设计为能够容忍随机的、小规模的故障,并且能够成功地向最终用户交付高可用性的云服务和交互式服务。然而,它们无法在主要停电、网络攻击或区域/区域故障造成的大规模中断中存活下来。这样的更改会触发级联故障和严重的服务中断。我们建议了解这些故障的原因,并创建可靠的数据服务,以有效和健壮地容忍这种大规模的资源变化。我们相信,云服务需要在未来频繁的、大规模的动态资源变化中存活下来,才能保持高可用性。(1)对云可靠性的重大新挑战正在出现,包括网络攻击、电力/网络中断等。例如,2017年2月28日,人为错误导致Amazon S3服务中断[2]。最近黑客甚至攻击电力设施,这可能导致更多的停电[3,6]。(2)增加对资源成本优化的关注将增加使用的动态性,例如Amazon Spot Instances[1]。(3)关注可用性的云应用程序将越来越多地进行持续测试,以确保它们没有隐藏的灾难性故障来源。例如,Netflix的Simian Army可以模拟单个服务器甚至整个AWS区域的中断[4]。(4)具有动态灵活性的云应用程序将获得许多好处,例如灵活部署,管理跨云提供商和数据中心的成本套利和可靠性套利等。我们使用Apache Cassandra[5]作为模型系统,描述了其在动态数据中心规模资源变化下的失效行为。每个数据中心都是不稳定的,并且在给定的占空比下随机关闭。我们在跨多个数据中心部署的基于quorum的系统上模拟只读工作负载,改变(1)系统规模,(2)易失性数据中心的比例,以及(3)易失性数据中心的占空系数。我们将探索各种配置的空间,包括复制因素和一致性级别,并度量服务可用性(成功请求的百分比)和复制开销(总副本数)。我们的研究结果表明,在一个易变的资源环境中,当前的复制和仲裁协议在类cassandra系统中无法在低复制开销的情况下实现高可用性和一致性。我们的贡献包括:(1)详细描述了动态数据中心规模资源变化下的故障,表明现有协议在基于群体的系统中无法以低复制成本实现高可用性和一致性。(2)动态数据中心规模资源环境下数据服务的最佳可实现可用性研究。
{"title":"Resilient cloud in dynamic resource environments","authors":"Fan Yang, A. Chien, Haryadi S. Gunawi","doi":"10.1145/3127479.3132571","DOIUrl":"https://doi.org/10.1145/3127479.3132571","url":null,"abstract":"Traditional cloud stacks are designed to tolerate random, small-scale failures, and can successfully deliver highly-available cloud services and interactive services to end users. However, they fail to survive large-scale disruptions that are caused by major power outage, cyber-attack, or region/zone failures. Such changes trigger cascading failures and significant service outages. We propose to understand the reasons for these failures, and create reliable data services that can efficiently and robustly tolerate such large-scale resource changes. We believe cloud services will need to survive frequent, large dynamic resource changes in the future to be highly available. (1) Significant new challenges to cloud reliability are emerging, including cyber-attacks, power/network outages, and so on. For example, human error disrupted Amazon S3 service on 02/28/17 [2]. Recently hackers are even attacking electric utilities, which may lead to more outages [3, 6]. (2) Increased attention on resource cost optimization will increase usage dynamism, such as Amazon Spot Instances [1]. (3) Availability focused cloud applications will increasingly practice continuous testing to ensure they have no hidden source of catastrophic failure. For example, Netflix Simian Army can simulate the outages of individual servers, and even an entire AWS region [4]. (4) Cloud applications with dynamic flexibility will reap numerous benefits, such as flexible deployments, managing cost arbitrage and reliability arbitrage across cloud provides and datacenters, etc. Using Apache Cassandra [5] as the model system, we characterize its failure behavior under dynamic datacenter-scale resource changes. Each datacenter is volatile and randomly shut down with a given duty factor. We simulate read-only workload on a quorum-based system deployed across multiple datacenters, varying (1) system scale, (2) the fraction of volatile datacenters, and (3) the duty factor of volatile datacenters. We explore the space of various configurations, including replication factors and consistency levels, and measure the service availability (% of succeeded requests) and replication overhead (number of total replicas). Our results show that, in a volatile resource environment, the current replication and quorum protocols in Cassandra-like systems cannot high availability and consistency with low replication overhead. Our contributions include: (1) Detailed characterization of failures under dynamic datacenter-scale resource changes, showing that the exiting protocols in quorum-based systems cannot achieve high availability and consistency with low replication cost. (2) Study of the best achieve-able availability of data service in dynamic datacenter-scale resource environment.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88317122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks PBSE:数据并行框架中退化网络尾部容忍度的稳健的基于路径的推测执行
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131622
Riza O. Suminto, Cesar A. Stuardo, Alexandra Clark, Huan Ke, Tanakorn Leesatapornwongsa, Bo Fu, D. Kurniawan, V. Martin, Maheswara Rao G. Uma, Haryadi S. Gunawi
We reveal loopholes of Speculative Execution (SE) implementations under a unique fault model: node-level network throughput degradation. This problem appears in many data-parallel frameworks such as Hadoop MapReduce and Spark. To address this, we present PBSE, a robust, path-based speculative execution that employs three key ingredients: path progress, path diversity, and path-straggler detection and speculation. We show how PBSE is superior to other approaches such as cloning and aggressive speculation under the aforementioned fault model. PBSE is a general solution, applicable to many data-parallel frameworks such as Hadoop/HDFS+QFS, Spark and Flume.
我们在一个独特的故障模型下揭示了推测执行(SE)实现的漏洞:节点级网络吞吐量退化。这个问题出现在许多数据并行框架中,如Hadoop MapReduce和Spark。为了解决这个问题,我们提出了PBSE,这是一种强大的、基于路径的推测执行,它采用了三个关键成分:路径进度、路径多样性和路径离散者检测和推测。我们展示了在上述故障模型下,PBSE如何优于克隆和积极猜测等其他方法。PBSE是一种通用解决方案,适用于Hadoop/HDFS+QFS、Spark、Flume等多种数据并行框架。
{"title":"PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks","authors":"Riza O. Suminto, Cesar A. Stuardo, Alexandra Clark, Huan Ke, Tanakorn Leesatapornwongsa, Bo Fu, D. Kurniawan, V. Martin, Maheswara Rao G. Uma, Haryadi S. Gunawi","doi":"10.1145/3127479.3131622","DOIUrl":"https://doi.org/10.1145/3127479.3131622","url":null,"abstract":"We reveal loopholes of Speculative Execution (SE) implementations under a unique fault model: node-level network throughput degradation. This problem appears in many data-parallel frameworks such as Hadoop MapReduce and Spark. To address this, we present PBSE, a robust, path-based speculative execution that employs three key ingredients: path progress, path diversity, and path-straggler detection and speculation. We show how PBSE is superior to other approaches such as cloning and aggressive speculation under the aforementioned fault model. PBSE is a general solution, applicable to many data-parallel frameworks such as Hadoop/HDFS+QFS, Spark and Flume.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74909986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Building smart memories and high-speed cloud services for the internet of things with derecho 通过derecho为物联网构建智能记忆和高速云服务
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3134597
Sagar Jha, J. Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, E. Tremel, Sydney Zink, K. Birman, R. V. Renesse
The coming generation of Internet-of-Things (IoT) applications will process massive amounts of incoming data while supporting data mining and online learning. In cases with demanding real-time requirements, such systems behave as smart memories: a high-bandwidth service that captures sensor input, processes it using machine-learning tools, replicates and stores "interesting" data (discarding uninteresting content), updates knowledge models, and triggers urgently-needed responses. Derecho is a high-throughput library for building smart memories and similar services. At its core Derecho implements atomic multicast (Vertical Paxos) and state machine replication (the classic durable Paxos). Derecho's replicated template defines a replicated type; the corresponding objects are associated with subgroups, which can be sharded into key-value structures. The persistent and volatile storage templates implement version vectors with optional NVM persistence. These support time-indexed access, offering lock-free snapshot isolation that blends temporal precision and causal consistency. Derecho automates application management, supporting multigroup structures and providing consistent knowledge of the current membership mapping. A query can access data from many shards or subgroups, and consistency is guaranteed without any form of distributed locking. Whereas many systems run consensus on the critical path, Derecho requires consensus only when updating membership. By leveraging an RDMA data plane and NVM storage, and adopting a novel receiver-side batching technique, Derecho can saturate a 12.5GB RDMA network, sending millions of events per second in each subgroup or shard. In a single subgroup with 2--16 members, through-put peaks at 16 GB/s for large (100MB or more) objects. While key-value subgroups would typically use 2 or 3-member shards, unsharded subgroups could be large. In tests with a 128-member group, Derecho's multicast and Paxos protocols were just 3--5x slower than for a small group, depending on the traffic pattern. With network contention, slow members, or overlapping groups that generate concurrent traffic, Derecho's protocols remain stable and adapt to the available bandwidth.
下一代物联网(IoT)应用将处理大量传入数据,同时支持数据挖掘和在线学习。在对实时性要求很高的情况下,这样的系统表现得像智能存储器:一种高带宽服务,可以捕获传感器输入,使用机器学习工具对其进行处理,复制和存储“感兴趣的”数据(丢弃无兴趣的内容),更新知识模型,并触发紧急需要的响应。Derecho是一个用于构建智能内存和类似服务的高吞吐量库。Derecho的核心实现了原子多播(垂直Paxos)和状态机复制(经典的持久Paxos)。Derecho的复制模板定义了一个复制类型;相应的对象与子组相关联,子组可以被分片为键值结构。持久性和易失性存储模板使用可选的NVM持久性实现版本向量。它们支持时间索引访问,提供无锁快照隔离,混合了时间精度和因果一致性。Derecho自动化应用程序管理,支持多组结构,并提供当前成员映射的一致知识。查询可以访问来自多个分片或子组的数据,并且在没有任何形式的分布式锁定的情况下保证一致性。尽管许多系统在关键路径上运行共识,但Derecho仅在更新成员时才需要共识。通过利用RDMA数据平面和NVM存储,并采用新颖的接收端批处理技术,Derecho可以使12.5GB的RDMA网络饱和,在每个子组或分片中每秒发送数百万个事件。在具有2—16个成员的单个子组中,对于大型(100MB或更多)对象,吞吐量峰值为16gb /s。虽然键值子组通常使用2或3个成员的分片,但未分片的子组可能很大。在一个有128名成员的小组的测试中,Derecho的多播和Paxos协议只比一个小组慢3- 5倍,具体取决于流量模式。在网络竞争、慢速成员或产生并发流量的重叠组的情况下,Derecho的协议保持稳定并适应可用带宽。
{"title":"Building smart memories and high-speed cloud services for the internet of things with derecho","authors":"Sagar Jha, J. Behrens, Theo Gkountouvas, Mae Milano, Weijia Song, E. Tremel, Sydney Zink, K. Birman, R. V. Renesse","doi":"10.1145/3127479.3134597","DOIUrl":"https://doi.org/10.1145/3127479.3134597","url":null,"abstract":"The coming generation of Internet-of-Things (IoT) applications will process massive amounts of incoming data while supporting data mining and online learning. In cases with demanding real-time requirements, such systems behave as smart memories: a high-bandwidth service that captures sensor input, processes it using machine-learning tools, replicates and stores \"interesting\" data (discarding uninteresting content), updates knowledge models, and triggers urgently-needed responses. Derecho is a high-throughput library for building smart memories and similar services. At its core Derecho implements atomic multicast (Vertical Paxos) and state machine replication (the classic durable Paxos). Derecho's replicated template defines a replicated type; the corresponding objects are associated with subgroups, which can be sharded into key-value structures. The persistent and volatile storage templates implement version vectors with optional NVM persistence. These support time-indexed access, offering lock-free snapshot isolation that blends temporal precision and causal consistency. Derecho automates application management, supporting multigroup structures and providing consistent knowledge of the current membership mapping. A query can access data from many shards or subgroups, and consistency is guaranteed without any form of distributed locking. Whereas many systems run consensus on the critical path, Derecho requires consensus only when updating membership. By leveraging an RDMA data plane and NVM storage, and adopting a novel receiver-side batching technique, Derecho can saturate a 12.5GB RDMA network, sending millions of events per second in each subgroup or shard. In a single subgroup with 2--16 members, through-put peaks at 16 GB/s for large (100MB or more) objects. While key-value subgroups would typically use 2 or 3-member shards, unsharded subgroups could be large. In tests with a 128-member group, Derecho's multicast and Paxos protocols were just 3--5x slower than for a small group, depending on the traffic pattern. With network contention, slow members, or overlapping groups that generate concurrent traffic, Derecho's protocols remain stable and adapt to the available bandwidth.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75497175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Job scheduling for data-parallel frameworks with hybrid electrical/optical datacenter networks 混合电/光数据中心网络数据并行框架的作业调度
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132694
Zhuozhao Li, Haiying Shen
In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). We propose SchedOCS, a job scheduler for data-parallel frameworks in Hybrid-DCN that aims to take full advantage of the OCS to improve the job performance.
尽管混合电/光数据中心网络(hybrid - dcn)具有许多优点,但目前用于数据并行框架的作业调度器并不适合hybrid - dcn,因为调度器没有聚合数据流量以方便使用光电路交换机(OCS)。为了充分利用OCS来提高作业性能,我们提出了一种用于Hybrid-DCN数据并行框架的作业调度器SchedOCS。
{"title":"Job scheduling for data-parallel frameworks with hybrid electrical/optical datacenter networks","authors":"Zhuozhao Li, Haiying Shen","doi":"10.1145/3127479.3132694","DOIUrl":"https://doi.org/10.1145/3127479.3132694","url":null,"abstract":"In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). We propose SchedOCS, a job scheduler for data-parallel frameworks in Hybrid-DCN that aims to take full advantage of the OCS to improve the job performance.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"89 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83871025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HyperNF: building a high performance, high utilization and fair NFV platform HyperNF:构建高性能、高利用率、公平的NFV平台
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3127489
Kenichi Yasukata, Felipe Huici, Vincenzo Maffione, G. Lettieri, Michio Honda
Network Function Virtualization has been touted as the silver bullet for tackling a number of operator problems, including vendor lock-in, fast deployment of new functionality, converged management, and lower expenditure since packet processing runs on inexpensive commodity servers. The reality, however, is that, in practice, it has proved hard to achieve the stable, predictable performance provided by hardware middleboxes, and so operators have essentially resorted to throwing money at the problem, deploying highly underutilized servers (e.g., one NF per CPU core) in order to guarantee high performance during peak periods and meet SLAs. In this work we introduce HyperNF, a high performance NFV framework aimed at maximizing server performance when concurrently running large numbers of NFs. To achieve this, HyperNF implements hypercall-based virtual I/O, placing packet forwarding logic inside the hypervisor to significantly reduce I/O synchronization overheads. HyperNF improves throughput by 10%-73% depending on the NF, is able to closely match resource allocation specifications (with deviations of only 3.5%), and to efficiently cope with changing traffic loads.
网络功能虚拟化被吹捧为解决许多运营商问题的灵丹妙药,包括供应商锁定、新功能的快速部署、融合管理,以及由于数据包处理在廉价的商用服务器上运行而降低的支出。然而,现实情况是,在实践中,硬件中间件提供的稳定、可预测的性能已经被证明是很难实现的,因此运营商不得不在这个问题上投入大量资金,部署高度未充分利用的服务器(例如,每个CPU核心一个NF),以保证高峰期间的高性能并满足sla。在本文中,我们介绍了HyperNF,这是一个高性能的NFV框架,旨在在并发运行大量NFs时最大化服务器性能。为了实现这一点,HyperNF实现了基于hypercall的虚拟I/O,将数据包转发逻辑放在hypervisor中,以显著降低I/O同步开销。根据不同的NF, HyperNF可以提高10%-73%的吞吐量,能够紧密匹配资源分配规范(偏差仅为3.5%),并且能够有效地应对不断变化的流量负载。
{"title":"HyperNF: building a high performance, high utilization and fair NFV platform","authors":"Kenichi Yasukata, Felipe Huici, Vincenzo Maffione, G. Lettieri, Michio Honda","doi":"10.1145/3127479.3127489","DOIUrl":"https://doi.org/10.1145/3127479.3127489","url":null,"abstract":"Network Function Virtualization has been touted as the silver bullet for tackling a number of operator problems, including vendor lock-in, fast deployment of new functionality, converged management, and lower expenditure since packet processing runs on inexpensive commodity servers. The reality, however, is that, in practice, it has proved hard to achieve the stable, predictable performance provided by hardware middleboxes, and so operators have essentially resorted to throwing money at the problem, deploying highly underutilized servers (e.g., one NF per CPU core) in order to guarantee high performance during peak periods and meet SLAs. In this work we introduce HyperNF, a high performance NFV framework aimed at maximizing server performance when concurrently running large numbers of NFs. To achieve this, HyperNF implements hypercall-based virtual I/O, placing packet forwarding logic inside the hypervisor to significantly reduce I/O synchronization overheads. HyperNF improves throughput by 10%-73% depending on the NF, is able to closely match resource allocation specifications (with deviations of only 3.5%), and to efficiently cope with changing traffic loads.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83853919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Selecting the best VM across multiple public clouds: a data-driven performance modeling approach 跨多个公共云选择最佳VM:数据驱动的性能建模方法
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3131614
N. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton J. Smith, R. Katz
Users of cloud services are presented with a bewildering choice of VM types and the choice of VM can have significant implications on performance and cost. In this paper we address the fundamental problem of accurately and economically choosing the best VM for a given workload and user goals. To address the problem of optimal VM selection, we present PARIS, a data-driven system that uses a novel hybrid offline and online data collection and modeling framework to provide accurate performance estimates with minimal data collection. PARIS is able to predict workload performance for different user-specified metrics, and resulting costs for a wide range of VM types and workloads across multiple cloud providers. When compared to sophisticated baselines, including collaborative filtering and a linear interpolation model using measured workload performance on two VM types, PARIS produces significantly better estimates of performance. For instance, it reduces runtime prediction error by a factor of 4 for some workloads on both AWS and Azure. The increased accuracy translates into a 45% reduction in user cost while maintaining performance.
云服务的用户面临着令人困惑的虚拟机类型选择,而虚拟机的选择可能对性能和成本产生重大影响。在本文中,我们解决了准确和经济地为给定的工作负载和用户目标选择最佳VM的基本问题。为了解决最佳VM选择问题,我们提出了PARIS,这是一个数据驱动的系统,它使用一种新颖的离线和在线混合数据收集和建模框架,以最少的数据收集提供准确的性能估计。PARIS能够预测不同用户指定指标的工作负载性能,以及跨多个云提供商的各种VM类型和工作负载的最终成本。与复杂的基线(包括协作过滤和使用在两种VM类型上测量的工作负载性能的线性插值模型)相比,PARIS产生了更好的性能估计。例如,对于AWS和Azure上的某些工作负载,它将运行时预测误差减少了1 / 4。精确度的提高意味着在保持性能的同时降低了45%的用户成本。
{"title":"Selecting the best VM across multiple public clouds: a data-driven performance modeling approach","authors":"N. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton J. Smith, R. Katz","doi":"10.1145/3127479.3131614","DOIUrl":"https://doi.org/10.1145/3127479.3131614","url":null,"abstract":"Users of cloud services are presented with a bewildering choice of VM types and the choice of VM can have significant implications on performance and cost. In this paper we address the fundamental problem of accurately and economically choosing the best VM for a given workload and user goals. To address the problem of optimal VM selection, we present PARIS, a data-driven system that uses a novel hybrid offline and online data collection and modeling framework to provide accurate performance estimates with minimal data collection. PARIS is able to predict workload performance for different user-specified metrics, and resulting costs for a wide range of VM types and workloads across multiple cloud providers. When compared to sophisticated baselines, including collaborative filtering and a linear interpolation model using measured workload performance on two VM types, PARIS produces significantly better estimates of performance. For instance, it reduces runtime prediction error by a factor of 4 for some workloads on both AWS and Azure. The increased accuracy translates into a 45% reduction in user cost while maintaining performance.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88861757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 162
Distributed resource management across process boundaries 跨流程边界的分布式资源管理
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132020
L. Suresh, P. Bodík, Ishai Menache, M. Canini, F. Ciucu
Multi-tenant distributed systems composed of small services, such as Service-oriented Architectures (SOAs) and Micro-services, raise new challenges in attaining high performance and efficient resource utilization. In these systems, a request execution spans tens to thousands of processes, and the execution paths and resource demands on different services are generally not known when a request first enters the system. In this paper, we highlight the fundamental challenges of regulating load and scheduling in SOAs while meeting end-to-end performance objectives on metrics of concern to both tenants and operators. We design Wisp, a framework for building SOAs that transparently adapts rate limiters and request schedulers system-wide according to operator policies to satisfy end-to-end goals while responding to changing system conditions. In evaluations against production as well as synthetic workloads, Wisp successfully enforces a range of end-to-end performance objectives, such as reducing average latencies, meeting deadlines, providing fairness and isolation, and avoiding system overload.
由小型服务(如面向服务的体系结构(soa)和微服务)组成的多租户分布式系统在实现高性能和有效的资源利用方面提出了新的挑战。在这些系统中,请求执行跨越数十到数千个进程,当请求首次进入系统时,通常不知道不同服务的执行路径和资源需求。在本文中,我们强调了在满足租户和运营商关注的指标上的端到端性能目标的同时,在soa中调节负载和调度的基本挑战。我们设计了Wisp,这是一个用于构建soa的框架,它可以根据运营商策略在系统范围内透明地适应速率限制器和请求调度器,以满足端到端目标,同时响应不断变化的系统条件。在针对生产和合成工作负载的评估中,Wisp成功地实现了一系列端到端性能目标,例如减少平均延迟、满足最后期限、提供公平性和隔离性,以及避免系统过载。
{"title":"Distributed resource management across process boundaries","authors":"L. Suresh, P. Bodík, Ishai Menache, M. Canini, F. Ciucu","doi":"10.1145/3127479.3132020","DOIUrl":"https://doi.org/10.1145/3127479.3132020","url":null,"abstract":"Multi-tenant distributed systems composed of small services, such as Service-oriented Architectures (SOAs) and Micro-services, raise new challenges in attaining high performance and efficient resource utilization. In these systems, a request execution spans tens to thousands of processes, and the execution paths and resource demands on different services are generally not known when a request first enters the system. In this paper, we highlight the fundamental challenges of regulating load and scheduling in SOAs while meeting end-to-end performance objectives on metrics of concern to both tenants and operators. We design Wisp, a framework for building SOAs that transparently adapts rate limiters and request schedulers system-wide according to operator policies to satisfy end-to-end goals while responding to changing system conditions. In evaluations against production as well as synthetic workloads, Wisp successfully enforces a range of end-to-end performance objectives, such as reducing average latencies, meeting deadlines, providing fairness and isolation, and avoiding system overload.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88403501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Towards an emergency edge supercloud 走向紧急边缘超级云
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3132253
Kolbeinn Karlsson, Zhiming Shen, Weijia Song, Hakim Weatherspoon, R. V. Renesse, S. Wicker
The "cloud paradigm" can provide a wealth of sophisticated emergency communication services that are gamechangers in emergency response, but its current implementation is not suitable to the challenging environments in which these responses often take place. The networking infrastructure may be all but unavailable, and access to centralized datacenters may be impossible.
"云范式"可以提供丰富的复杂应急通信服务,改变应急响应的规则,但目前的实施并不适合这些响应经常发生的具有挑战性的环境。网络基础设施可能几乎不可用,访问集中式数据中心可能是不可能的。
{"title":"Towards an emergency edge supercloud","authors":"Kolbeinn Karlsson, Zhiming Shen, Weijia Song, Hakim Weatherspoon, R. V. Renesse, S. Wicker","doi":"10.1145/3127479.3132253","DOIUrl":"https://doi.org/10.1145/3127479.3132253","url":null,"abstract":"The \"cloud paradigm\" can provide a wealth of sophisticated emergency communication services that are gamechangers in emergency response, but its current implementation is not suitable to the challenging environments in which these responses often take place. The networking infrastructure may be all but unavailable, and access to centralized datacenters may be impossible.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81196875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards verifiable metering for database as a service providers 面向可验证计量的数据库服务提供商
Pub Date : 2017-09-24 DOI: 10.1145/3127479.3134349
Min Du, Ravishankar Ramamurthy
Metering is an important component of cloud database services. We discuss potential problems in verifiability for existing DBaaS metering and initiate a discussion of how we can address this problem.
计量是云数据库服务的重要组成部分。我们将讨论现有DBaaS计量的可验证性中的潜在问题,并开始讨论如何解决这个问题。
{"title":"Towards verifiable metering for database as a service providers","authors":"Min Du, Ravishankar Ramamurthy","doi":"10.1145/3127479.3134349","DOIUrl":"https://doi.org/10.1145/3127479.3134349","url":null,"abstract":"Metering is an important component of cloud database services. We discuss potential problems in verifiability for existing DBaaS metering and initiate a discussion of how we can address this problem.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82594928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2017 Symposium on Cloud Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1