首页 > 最新文献

Proceedings of the Sixth ACM Symposium on Cloud Computing最新文献

英文 中文
Tarcil: reconciling scheduling speed and quality in large shared clusters 目标:协调大型共享集群的调度速度和质量
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806779
Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality. Tarcil uses an analytically derived sampling framework that adjusts the sample size based on load, and provides statistical guarantees on the quality of allocated resources. It also implements admission control when sampling is unlikely to find suitable resources. This makes it appropriate for large, shared clusters hosting short- and long-running jobs. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general scenarios, Tarcil achieves near-optimal performance for 4× and 2× more jobs than sampling-based and centralized schedulers respectively.
在大型共享集群中调度不同的应用程序尤其具有挑战性。最近关于集群调度的研究要么集中在调度速度上,使用采样来快速地将资源分配给任务,要么集中在调度质量上,使用集中算法来搜索资源,从而提高任务性能和集群利用率。我们提出了Tarcil,一个分布式调度程序,目标是调度速度和质量。Tarcil使用一种分析派生的抽样框架,根据负载调整样本大小,并提供分配资源质量的统计保证。当采样不太可能找到合适的资源时,它还实现了允许控制。这使得它适用于托管短期和长期运行作业的大型共享集群。我们在EC2上有数百台服务器的集群上评估了Tarcil。对于运行短作业的高负载集群,与基于抽样的分布式调度器相比,Tarcil将任务执行时间提高了41%。对于更一般的场景,与基于采样的调度器和集中式调度器相比,Tarcil分别实现了4倍和2倍的近乎最优的性能。
{"title":"Tarcil: reconciling scheduling speed and quality in large shared clusters","authors":"","doi":"10.1145/2806777.2806779","DOIUrl":"https://doi.org/10.1145/2806777.2806779","url":null,"abstract":"Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality. Tarcil uses an analytically derived sampling framework that adjusts the sample size based on load, and provides statistical guarantees on the quality of allocated resources. It also implements admission control when sampling is unlikely to find suitable resources. This makes it appropriate for large, shared clusters hosting short- and long-running jobs. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general scenarios, Tarcil achieves near-optimal performance for 4× and 2× more jobs than sampling-based and centralized schedulers respectively.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124217107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 174
Achieving cost-efficient, data-intensive computing in the cloud 在云中实现经济高效的数据密集型计算
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806781
Michael Conley, Amin Vahdat, G. Porter
Cloud computing providers have recently begun to offer high-performance virtualized flash storage and virtualized network I/O capabilities, which have the potential to increase application performance. Since users pay for only the resources they use, these new resources have the potential to lower overall cost. Yet achieving low cost requires choosing the right mixture of resources, which is only possible if their performance and scaling behavior is known. In this paper, we present a systematic measurement of recently introduced virtualized storage and network I/O within Amazon Web Services (AWS). Our experience shows that there are scaling limitations in clusters relying on these new features. As a result, provisioning for a large-scale cluster differs substantially from small-scale deployments. We describe the implications of this observation for achieving efficiency in large-scale cloud deployments. To confirm the value of our methodology, we deploy cost-efficient, high-performance sorting of 100 TB as a large-scale evaluation.
云计算提供商最近开始提供高性能虚拟化闪存和虚拟化网络I/O功能,这有可能提高应用程序的性能。由于用户只需为他们使用的资源付费,因此这些新资源有可能降低总体成本。然而,实现低成本需要选择正确的资源组合,这只有在它们的性能和扩展行为已知的情况下才有可能。在本文中,我们对Amazon Web Services (AWS)中最近引入的虚拟化存储和网络I/O进行了系统测量。我们的经验表明,依赖这些新特性的集群存在扩展限制。因此,大规模集群的供应与小规模部署有很大不同。我们描述了在大规模云部署中实现效率的这一观察结果的含义。为了确认我们的方法的价值,我们部署了100 TB的经济高效的高性能分类作为大规模评估。
{"title":"Achieving cost-efficient, data-intensive computing in the cloud","authors":"Michael Conley, Amin Vahdat, G. Porter","doi":"10.1145/2806777.2806781","DOIUrl":"https://doi.org/10.1145/2806777.2806781","url":null,"abstract":"Cloud computing providers have recently begun to offer high-performance virtualized flash storage and virtualized network I/O capabilities, which have the potential to increase application performance. Since users pay for only the resources they use, these new resources have the potential to lower overall cost. Yet achieving low cost requires choosing the right mixture of resources, which is only possible if their performance and scaling behavior is known. In this paper, we present a systematic measurement of recently introduced virtualized storage and network I/O within Amazon Web Services (AWS). Our experience shows that there are scaling limitations in clusters relying on these new features. As a result, provisioning for a large-scale cluster differs substantially from small-scale deployments. We describe the implications of this observation for achieving efficiency in large-scale cloud deployments. To confirm the value of our methodology, we deploy cost-efficient, high-performance sorting of 100 TB as a large-scale evaluation.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131887773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
vFair: latency-aware fair storage scheduling via per-IO cost-based differentiation vFair:通过基于每个io成本的差异化实现延迟感知的公平存储调度
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806943
Hui Lu, Brendan Saltaformaggio, R. Kompella, Dongyan Xu
In virtualized data centers, multiple VMs are consolidated to access a shared storage system. Effective storage resource management, however, turns out to be challenging, as VM workloads exhibit various IO patterns and diverse loads. To multiplex the underlying hardware resources among VMs, providing fairness and isolation while maintaining high resource utilization becomes imperative for effective storage resource management. Existing schedulers such as Linux CFQ or SFQ can provide some fairness, but it has been observed that synchronous IO tends to lose fair shares significantly when competing with aggressive VMs. In this paper, we introduce vFair, a novel scheduling framework that achieves IO resource sharing fairness among VMs, regardless of their IO patterns and workloads. The design of vFair takes per-IO cost into consideration and strikes a balance between fairness and storage resource utilization. We have developed a Xen-based prototype of vFair and evaluated it with a wide range of storage workloads. Our results from both micro-benchmarks and real-world applications demonstrate the effectiveness of vFair, with significantly improved fairness and high resource utilization.
在虚拟化的数据中心中,多台虚拟机通过整合来访问共享存储。然而,有效的存储资源管理具有挑战性,因为VM工作负载表现出各种IO模式和各种负载。实现底层硬件资源在虚拟机之间的复用,在保证资源利用率的同时保证公平性和隔离性,是有效管理存储资源的必要条件。现有的调度器(如Linux的CFQ或SFQ)可以提供一定程度的公平性,但据观察,在与主动的vm竞争时,同步IO往往会严重失去公平份额。在本文中,我们介绍了一种新的调度框架vFair,它可以实现虚拟机之间IO资源共享的公平性,而不受其IO模式和工作负载的影响。vFair的设计考虑了每个io的成本,在公平性和存储资源利用率之间取得了平衡。我们开发了一个基于xen的vFair原型,并在各种存储工作负载下对其进行了评估。我们的微基准测试和实际应用的结果都证明了vFair的有效性,显著提高了公平性和资源利用率。
{"title":"vFair: latency-aware fair storage scheduling via per-IO cost-based differentiation","authors":"Hui Lu, Brendan Saltaformaggio, R. Kompella, Dongyan Xu","doi":"10.1145/2806777.2806943","DOIUrl":"https://doi.org/10.1145/2806777.2806943","url":null,"abstract":"In virtualized data centers, multiple VMs are consolidated to access a shared storage system. Effective storage resource management, however, turns out to be challenging, as VM workloads exhibit various IO patterns and diverse loads. To multiplex the underlying hardware resources among VMs, providing fairness and isolation while maintaining high resource utilization becomes imperative for effective storage resource management. Existing schedulers such as Linux CFQ or SFQ can provide some fairness, but it has been observed that synchronous IO tends to lose fair shares significantly when competing with aggressive VMs. In this paper, we introduce vFair, a novel scheduling framework that achieves IO resource sharing fairness among VMs, regardless of their IO patterns and workloads. The design of vFair takes per-IO cost into consideration and strikes a balance between fairness and storage resource utilization. We have developed a Xen-based prototype of vFair and evaluated it with a wide range of storage workloads. Our results from both micro-benchmarks and real-world applications demonstrate the effectiveness of vFair, with significantly improved fairness and high resource utilization.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121300377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Using data transformations for low-latency time series analysis 使用数据转换进行低延迟时间序列分析
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806839
Henggang Cui, K. Keeton, Indrajit Roy, K. Viswanathan, G. Ganger
Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized representations that it can use to efficiently answer queries at runtime. Aperture handles a range of complex queries, from correlating hundreds of lengthy time series to predicting anomalies in the data. Aperture achieves much of its high performance by executing queries on data summaries, while providing a bound on the information lost when transforming data. By doing so, Aperture can reduce query latency as well as the data that needs to be stored and analyzed to answer a query. Our experiments on real data show that Aperture can provide one to four orders of magnitude lower query response time, while incurring only 10% ingest time overhead and less than 20% error in accuracy.
时间序列分析通常用于监控数据中心、网络、天气甚至人类患者。在大多数情况下,原始时间序列数据是巨大的,从数百万到数十亿个数据点,而交互式分析需要较低的延迟(例如,亚秒级)。Aperture将原始时间序列数据在摄取过程中转换为紧凑的汇总表示,以便在运行时有效地回答查询。Aperture可以处理一系列复杂的查询,从关联数百个长时间序列到预测数据中的异常情况。Aperture通过在数据摘要上执行查询实现了大部分高性能,同时提供了转换数据时丢失的信息的限制。通过这样做,Aperture可以减少查询延迟以及需要存储和分析的数据来回答查询。我们在真实数据上的实验表明,Aperture可以提供1到4个数量级的查询响应时间,同时只产生10%的摄取时间开销和不到20%的精度误差。
{"title":"Using data transformations for low-latency time series analysis","authors":"Henggang Cui, K. Keeton, Indrajit Roy, K. Viswanathan, G. Ganger","doi":"10.1145/2806777.2806839","DOIUrl":"https://doi.org/10.1145/2806777.2806839","url":null,"abstract":"Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized representations that it can use to efficiently answer queries at runtime. Aperture handles a range of complex queries, from correlating hundreds of lengthy time series to predicting anomalies in the data. Aperture achieves much of its high performance by executing queries on data summaries, while providing a bound on the information lost when transforming data. By doing so, Aperture can reduce query latency as well as the data that needs to be stored and analyzed to answer a query. Our experiments on real data show that Aperture can provide one to four orders of magnitude lower query response time, while incurring only 10% ingest time overhead and less than 20% error in accuracy.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125617357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reducing replication bandwidth for distributed document databases 减少分布式文档数据库的复制带宽
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806840
Lianghong Xu, Andrew Pavlo, S. Sengupta, Jin Li, G. Ganger
With the rise of large-scale, Web-based applications, users are increasingly adopting a new class of document-oriented database management systems (DBMSs) that allow for rapid prototyping while also achieving scalable performance. Like for other distributed storage systems, replication is important for document DBMSs in order to guarantee availability. The network bandwidth required to keep replicas synchronized is expensive and is often a performance bottleneck. As such, there is a strong need to reduce the replication bandwidth, especially for geo-replication scenarios where wide-area network (WAN) bandwidth is limited. This paper presents a deduplication system called sDedup that reduces the amount of data transferred over the network for replicated document DBMSs. sDedup uses similarity-based deduplication to remove redundancy in replication data by delta encoding against similar documents selected from the entire database. It exploits key characteristics of document-oriented workloads, including small item sizes, temporal locality, and the incremental nature of document edits. Our experimental evaluation of sDedup with three real-world datasets shows that it is able to achieve up to 38X reduction in data sent over the network, significantly outperforming traditional chunk-based deduplication techniques while incurring negligible performance overhead.
随着大规模、基于web的应用程序的兴起,用户越来越多地采用一类新的面向文档的数据库管理系统(dbms),这些系统允许快速原型设计,同时还实现可伸缩的性能。与其他分布式存储系统一样,为了保证可用性,复制对文档dbms非常重要。保持副本同步所需的网络带宽非常昂贵,并且经常成为性能瓶颈。因此,非常需要减少复制带宽,特别是对于广域网(WAN)带宽有限的地理复制场景。本文介绍了一个名为sDedup的重复数据删除系统,它减少了通过网络传输的用于复制文档dbms的数据量。sDedup使用基于相似性的重复数据删除,通过对从整个数据库中选择的类似文档进行增量编码来消除复制数据中的冗余。它利用了面向文档工作负载的关键特征,包括小项目大小、时间局部性和文档编辑的增量性质。我们使用三个真实数据集对sDedup进行的实验评估表明,它能够将通过网络发送的数据减少多达38倍,显著优于传统的基于块的重复数据删除技术,而产生的性能开销可以忽略不计。
{"title":"Reducing replication bandwidth for distributed document databases","authors":"Lianghong Xu, Andrew Pavlo, S. Sengupta, Jin Li, G. Ganger","doi":"10.1145/2806777.2806840","DOIUrl":"https://doi.org/10.1145/2806777.2806840","url":null,"abstract":"With the rise of large-scale, Web-based applications, users are increasingly adopting a new class of document-oriented database management systems (DBMSs) that allow for rapid prototyping while also achieving scalable performance. Like for other distributed storage systems, replication is important for document DBMSs in order to guarantee availability. The network bandwidth required to keep replicas synchronized is expensive and is often a performance bottleneck. As such, there is a strong need to reduce the replication bandwidth, especially for geo-replication scenarios where wide-area network (WAN) bandwidth is limited. This paper presents a deduplication system called sDedup that reduces the amount of data transferred over the network for replicated document DBMSs. sDedup uses similarity-based deduplication to remove redundancy in replication data by delta encoding against similar documents selected from the entire database. It exploits key characteristics of document-oriented workloads, including small item sizes, temporal locality, and the incremental nature of document edits. Our experimental evaluation of sDedup with three real-world datasets shows that it is able to achieve up to 38X reduction in data sent over the network, significantly outperforming traditional chunk-based deduplication techniques while incurring negligible performance overhead.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123467641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
GraM: scaling graph computation to the trillions GraM:将图计算缩放到数万亿
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806849
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, Lidong Zhou
GraM is an efficient and scalable graph engine for a large class of widely used graph algorithms. It is designed to scale up to multicores on a single server, as well as scale out to multiple servers in a cluster, offering significant, often over an order-of-magnitude, improvement over existing distributed graph engines on evaluated graph algorithms. GraM is also capable of processing graphs that are significantly larger than previously reported. In particular, using 64 servers (1,024 physical cores), it performs a PageRank iteration in 140 seconds on a synthetic graph with over one trillion edges, setting a new milestone for graph engines. GraM's efficiency and scalability comes from a judicious architectural design that exploits the benefits of multi-core and RDMA. GraM uses a simple message-passing based scaling architecture for both scaling up and scaling out to expose inherent parallelism. It further benefits from a specially designed multi-core aware RDMA-based communication stack that preserves parallelism in a balanced way and allows overlapping of communication and computation. A high degree of parallelism often comes at the cost of lower efficiency due to resource fragmentation. GraM is equipped with an adaptive mechanism that evaluates the cost and benefit of parallelism to decide the appropriate configuration. Combined, these mechanisms allow GraM to scale up and out with high efficiency.
GraM是一种高效的、可扩展的图引擎,用于大量广泛使用的图算法。它旨在扩展到单个服务器上的多核,以及扩展到集群中的多个服务器,在评估图算法上提供比现有分布式图引擎显著(通常超过数量级)的改进。GraM还能够处理比以前报道的大得多的图。特别是,使用64台服务器(1024个物理内核),它在140秒内对一个超过一万亿边的合成图执行一次PageRank迭代,为图引擎设定了一个新的里程碑。GraM的效率和可伸缩性来自明智的体系结构设计,该设计充分利用了多核和RDMA的优势。GraM使用一个简单的基于消息传递的扩展体系结构来向上扩展和向外扩展,以暴露固有的并行性。它还受益于专门设计的多核感知rdma通信堆栈,该堆栈以平衡的方式保持并行性,并允许通信和计算的重叠。高度的并行性通常是以由于资源分散而降低效率为代价的。GraM配备了一种自适应机制,用于评估并行性的成本和收益,以决定适当的配置。结合起来,这些机制允许GraM以高效率向上和向外扩展。
{"title":"GraM: scaling graph computation to the trillions","authors":"Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, Lidong Zhou","doi":"10.1145/2806777.2806849","DOIUrl":"https://doi.org/10.1145/2806777.2806849","url":null,"abstract":"GraM is an efficient and scalable graph engine for a large class of widely used graph algorithms. It is designed to scale up to multicores on a single server, as well as scale out to multiple servers in a cluster, offering significant, often over an order-of-magnitude, improvement over existing distributed graph engines on evaluated graph algorithms. GraM is also capable of processing graphs that are significantly larger than previously reported. In particular, using 64 servers (1,024 physical cores), it performs a PageRank iteration in 140 seconds on a synthetic graph with over one trillion edges, setting a new milestone for graph engines. GraM's efficiency and scalability comes from a judicious architectural design that exploits the benefits of multi-core and RDMA. GraM uses a simple message-passing based scaling architecture for both scaling up and scaling out to expose inherent parallelism. It further benefits from a specially designed multi-core aware RDMA-based communication stack that preserves parallelism in a balanced way and allows overlapping of communication and computation. A high degree of parallelism often comes at the cost of lower efficiency due to resource fragmentation. GraM is equipped with an adaptive mechanism that evaluates the cost and benefit of parallelism to decide the appropriate configuration. Combined, these mechanisms allow GraM to scale up and out with high efficiency.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128372829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems ShardFS与IndexFS:云存储系统中分布式元数据管理的复制与缓存策略
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806844
Lin Xiao, Kai Ren, Qing Zheng, Garth A. Gibson
The rapid growth of cloud storage systems calls for fast and scalable namespace processing. While few commercial file systems offer anything better than federating individually non-scalable namespace servers, a recent academic file system, IndexFS, demonstrates scalable namespace processing based on client caching of directory entries and permissions (directory lookup state) with no per-client state in servers. In this paper we explore explicit replication of directory lookup state in all servers as an alternative to caching this information in all clients. Both eliminate most repeated RPCs to different servers in order to resolve hierarchical permission tests. Our realization for server replicated directory lookup state, ShardFS, employs a novel file system specific hybrid optimistic and pessimistic concurrency control favoring single object transactions over distributed transactions. Our experimentation suggests that if directory lookup state mutation is a fixed fraction of operations (strong scaling for metadata), server replication does not scale as well as client caching, but if directory lookup state mutation is proportional to the number of jobs, not the number of processes per job, (weak scaling for metadata), then server replication can scale more linearly than client caching and provide lower 70 percentile response times as well.
云存储系统的快速增长需要快速和可扩展的命名空间处理。虽然很少有商业文件系统能提供比联合单个不可扩展的名称空间服务器更好的东西,但最近的一个学术文件系统IndexFS演示了基于目录条目和权限(目录查找状态)的客户端缓存的可扩展名称空间处理,而服务器中没有每个客户端状态。在本文中,我们将探索在所有服务器中显式复制目录查找状态,作为在所有客户端中缓存此信息的替代方案。两者都消除了对不同服务器的大多数重复rpc,以便解决分层权限测试。我们对服务器复制目录查找状态的实现ShardFS采用了一种新的特定于文件系统的混合乐观和悲观并发控制,它更倾向于单对象事务而不是分布式事务。我们的实验表明,如果目录查找状态的变化是操作的固定部分(元数据的可伸缩性强),服务器复制的可伸缩性不如客户端缓存好,但是如果目录查找状态的变化与作业的数量成正比,而不是与每个作业的进程数成比例(元数据的可伸缩性弱),那么服务器复制可以比客户端缓存更线性地扩展,并且提供更低的70%的响应时间。
{"title":"ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems","authors":"Lin Xiao, Kai Ren, Qing Zheng, Garth A. Gibson","doi":"10.1145/2806777.2806844","DOIUrl":"https://doi.org/10.1145/2806777.2806844","url":null,"abstract":"The rapid growth of cloud storage systems calls for fast and scalable namespace processing. While few commercial file systems offer anything better than federating individually non-scalable namespace servers, a recent academic file system, IndexFS, demonstrates scalable namespace processing based on client caching of directory entries and permissions (directory lookup state) with no per-client state in servers. In this paper we explore explicit replication of directory lookup state in all servers as an alternative to caching this information in all clients. Both eliminate most repeated RPCs to different servers in order to resolve hierarchical permission tests. Our realization for server replicated directory lookup state, ShardFS, employs a novel file system specific hybrid optimistic and pessimistic concurrency control favoring single object transactions over distributed transactions. Our experimentation suggests that if directory lookup state mutation is a fixed fraction of operations (strong scaling for metadata), server replication does not scale as well as client caching, but if directory lookup state mutation is proportional to the number of jobs, not the number of processes per job, (weak scaling for metadata), then server replication can scale more linearly than client caching and provide lower 70 percentile response times as well.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121659471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
CoolProvision: underprovisioning datacenter cooling CoolProvision:数据中心冷却不足
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2806938
I. Manousakis, Íñigo Goiri, S. Sankar, Thu D. Nguyen, R. Bianchini
Cloud providers have made significant strides in reducing the cooling capital and operational costs of their datacenters, for example, by leveraging outside air ("free") cooling where possible. Despite these advances, cooling costs still represent a significant expense mainly because cloud providers typically provision their cooling infrastructure for the worst-case scenario (i.e., very high load and outside temperature at the same time). Thus, in this paper, we propose to reduce cooling costs by underprovisioning the cooling infrastructure. When the cooling is underprovisioned, there might be (rare) periods when the cooling infrastructure cannot cool down the IT equipment enough. During these periods, we can either (1) reduce the processing capacity and potentially degrade the quality of service, or (2) let the IT equipment temperature increase in exchange for a controlled degradation in reliability. To determine the ideal amount of underprovisioning, we introduce CoolProvision, an optimization and simulation framework for selecting the cheapest provisioning within performance constraints defined by the provider. CoolProvision leverages an abstract trace of the expected workload, as well as cooling, performance, power, reliability, and cost models to explore the space of potential provisionings. Using data from a real small free-cooled datacenter, our results suggest that CoolProvision can reduce the cost of cooling by up to 55%. We extrapolate our experience and results to larger cloud datacenters as well.
云提供商在减少冷却资金和数据中心运营成本方面取得了重大进展,例如,通过尽可能利用外部空气(“免费”)冷却。尽管取得了这些进步,但冷却成本仍然是一笔巨大的开支,主要是因为云提供商通常会为最坏的情况(即同时具有非常高的负载和外部温度)提供冷却基础设施。因此,在本文中,我们建议通过提供不足的冷却基础设施来降低冷却成本。当冷却供应不足时,可能会出现(罕见的)冷却基础设施无法对IT设备进行足够冷却的情况。在此期间,我们可以(1)降低处理能力并可能降低服务质量,或者(2)让IT设备温度升高以换取可靠性的可控降低。为了确定配置不足的理想数量,我们引入了CoolProvision,这是一个优化和模拟框架,用于在提供商定义的性能约束下选择最便宜的配置。CoolProvision利用预期工作负载的抽象跟踪,以及冷却、性能、功率、可靠性和成本模型来探索潜在供应的空间。使用一个真实的小型自然冷却数据中心的数据,我们的结果表明,CoolProvision可以将冷却成本降低高达55%。我们也将我们的经验和结果推广到更大的云数据中心。
{"title":"CoolProvision: underprovisioning datacenter cooling","authors":"I. Manousakis, Íñigo Goiri, S. Sankar, Thu D. Nguyen, R. Bianchini","doi":"10.1145/2806777.2806938","DOIUrl":"https://doi.org/10.1145/2806777.2806938","url":null,"abstract":"Cloud providers have made significant strides in reducing the cooling capital and operational costs of their datacenters, for example, by leveraging outside air (\"free\") cooling where possible. Despite these advances, cooling costs still represent a significant expense mainly because cloud providers typically provision their cooling infrastructure for the worst-case scenario (i.e., very high load and outside temperature at the same time). Thus, in this paper, we propose to reduce cooling costs by underprovisioning the cooling infrastructure. When the cooling is underprovisioned, there might be (rare) periods when the cooling infrastructure cannot cool down the IT equipment enough. During these periods, we can either (1) reduce the processing capacity and potentially degrade the quality of service, or (2) let the IT equipment temperature increase in exchange for a controlled degradation in reliability. To determine the ideal amount of underprovisioning, we introduce CoolProvision, an optimization and simulation framework for selecting the cheapest provisioning within performance constraints defined by the provider. CoolProvision leverages an abstract trace of the expected workload, as well as cooling, performance, power, reliability, and cost models to explore the space of potential provisionings. Using data from a real small free-cooled datacenter, our results suggest that CoolProvision can reduce the cost of cooling by up to 55%. We extrapolate our experience and results to larger cloud datacenters as well.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131184548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Kubernetes and the path to cloud native Kubernetes和通往云原生的道路
Pub Date : 2015-08-27 DOI: 10.1145/2806777.2809955
E. Brewer
We are in the midst of an important shift to higher levels of abstraction than virtual machines. Kubernetes aims to simplify the deployment and management of services, including the construction of applications as sets of interacting but independent services. We explain some of the key concepts in Kubernetes and show how they work together to simplify evolution and scaling.
我们正处于向比虚拟机更高抽象层次的重要转变中。Kubernetes旨在简化服务的部署和管理,包括将应用程序构建为一组相互作用但独立的服务。我们解释了Kubernetes中的一些关键概念,并展示了它们如何协同工作以简化演化和扩展。
{"title":"Kubernetes and the path to cloud native","authors":"E. Brewer","doi":"10.1145/2806777.2809955","DOIUrl":"https://doi.org/10.1145/2806777.2809955","url":null,"abstract":"We are in the midst of an important shift to higher levels of abstraction than virtual machines. Kubernetes aims to simplify the deployment and management of services, including the construction of applications as sets of interacting but independent services. We explain some of the key concepts in Kubernetes and show how they work together to simplify evolution and scaling.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131054716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 140
Proceedings of the Sixth ACM Symposium on Cloud Computing 第六届ACM云计算研讨会论文集
Pub Date : 2015-08-27 DOI: 10.1145/2806777
Shahram Ghandeharizadeh, M. Balazinska, M. Freedman, Sumita Barahmand
The stated scope of SoCC is to be broad and encompass diverse data management and systems topics, and this year's 34 accepted papers are no exception. They touch on a wide range of data systems topics including new architectures, scheduling, performance modeling, high availability, replication, elasticity, migration, costs and performance trade-offs, complex analysis, and testing. The conference also includes 2 poster sessions (with 30 posters in addition to invited poster presentations for the accepted papers), keynotes by Eric Brewer of Google/UC Berkeley and Samuel Madden of MIT, and a social program that includes a banquet and a luncheon for students and senior systems and database researchers. The symposium is co-located with the 41st International Conference on Very Large Databases, VLDB 2015, highlighting the synergy between big data and the cloud.
SoCC的范围很广,涵盖了不同的数据管理和系统主题,今年的34篇论文也不例外。它们涉及广泛的数据系统主题,包括新架构、调度、性能建模、高可用性、复制、弹性、迁移、成本和性能权衡、复杂分析和测试。会议还包括2个海报环节(除了被接受的论文的邀请海报展示外,还有30张海报),b谷歌/UC Berkeley的Eric Brewer和MIT的Samuel Madden的主题演讲,以及一个社交项目,包括为学生和高级系统和数据库研究人员举办的宴会和午餐。本次研讨会与第41届超大型数据库国际会议(VLDB 2015)同期举行,重点讨论大数据与云之间的协同作用。
{"title":"Proceedings of the Sixth ACM Symposium on Cloud Computing","authors":"Shahram Ghandeharizadeh, M. Balazinska, M. Freedman, Sumita Barahmand","doi":"10.1145/2806777","DOIUrl":"https://doi.org/10.1145/2806777","url":null,"abstract":"The stated scope of SoCC is to be broad and encompass diverse data management and systems topics, and this year's 34 accepted papers are no exception. They touch on a wide range of data systems topics including new architectures, scheduling, performance modeling, high availability, replication, elasticity, migration, costs and performance trade-offs, complex analysis, and testing. The conference also includes 2 poster sessions (with 30 posters in addition to invited poster presentations for the accepted papers), keynotes by Eric Brewer of Google/UC Berkeley and Samuel Madden of MIT, and a social program that includes a banquet and a luncheon for students and senior systems and database researchers. The symposium is co-located with the 41st International Conference on Very Large Databases, VLDB 2015, highlighting the synergy between big data and the cloud.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the Sixth ACM Symposium on Cloud Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1