Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform

ACM Transactions on Computer Systems (TOCS) Pub Date : 2016-04-06 DOI:10.1145/2897393

Sheng Li, Hyeontaek Lim, V. Lee, Jung Ho Ahn, Anuj Kalia, M. Kaminsky, D. Andersen, S. O, Sukhan Lee, P. Dubey

{"title":"Full-Stack Architecting to Achieve a Billion-Requests-Per-Second Throughput on a Single Key-Value Store Server Platform","authors":"Sheng Li, Hyeontaek Lim, V. Lee, Jung Ho Ahn, Anuj Kalia, M. Kaminsky, D. Andersen, S. O, Sukhan Lee, P. Dubey","doi":"10.1145/2897393","DOIUrl":null,"url":null,"abstract":"Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented data center infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of data centers. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused on improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts also showed orders of magnitude improvement over stock memcached. We aim at architecting high-performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems but also leads to guided optimizations atop a recent design to achieve a record-setting throughput of 120 million requests per second (MRPS) (167MRPS with client-side batching) on a single commodity server. Our system delivers the best performance and energy efficiency (RPS/watt) demonstrated to date over existing KVSs including the best-published FPGA-based and GPU-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented data center infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of data centers. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused on improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts also showed orders of magnitude improvement over stock memcached. We aim at architecting high-performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems but also leads to guided optimizations atop a recent design to achieve a record-setting throughput of 120 million requests per second (MRPS) (167MRPS with client-side batching) on a single commodity server. Our system delivers the best performance and energy efficiency (RPS/watt) demonstrated to date over existing KVSs including the best-published FPGA-based and GPU-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在单个键值存储服务器平台上实现每秒十亿请求吞吐量的全栈架构

分布式内存中的键值存储(KVSs)，如memcached，已经成为现代面向internet的数据中心基础设施中的关键数据服务层。它们的性能和效率直接影响到web服务的QoS和数据中心的效率。传统上，由于网络处理效率低下、涉及OS内核和并发控制，这些系统有很大的开销。最近的两个研究重点集中在提高键值性能上。以硬件为中心的研究已经开始探索专门的平台，包括用于kvs的fpga;结果表明，吞吐量和能源效率比库存memcached提高了一个数量级。以软件为中心的研究重新审视了KVS应用程序，以解决基本的软件瓶颈，并充分利用现代商用硬件的潜力;这些努力也显示了相对于普通memcached的数量级改进。我们的目标是构建高性能和高效的KVS平台，并在一系列具有代表性的KVS实现的系统堆栈上开始严格的体系结构表征。我们详细的全系统特性不仅确定了高性能KVS系统的关键硬件/软件成分，而且还在最近的设计上进行了指向性优化，从而在单个商品服务器上实现了每秒1.2亿个请求(MRPS)的创纪录吞吐量(客户端批处理时为167MRPS)。我们的系统提供了迄今为止在现有kv中展示的最佳性能和能效(RPS/watt)，包括基于fpga和gpu的最佳发布声明。我们为未来的平台架构制定了一套设计原则，并通过详细的模拟证明了使用遵循我们原则构建的单个服务器实现十亿RPS的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量