为天狼星设计未来的仓库级计算机，一个端到端的语音和视觉个人助理

IF 2 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS ACM Transactions on Computer Systems Pub Date : 2016-04-06 DOI:10.1145/2870631

Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars

{"title":"为天狼星设计未来的仓库级计算机，一个端到端的语音和视觉个人助理","authors":"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars","doi":"10.1145/2870631","DOIUrl":null,"url":null,"abstract":"As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"27 1","pages":"2:1-2:32"},"PeriodicalIF":2.0000,"publicationDate":"2016-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant\",\"authors\":\"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars\",\"doi\":\"10.1145/2870631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.\",\"PeriodicalId\":50918,\"journal\":{\"name\":\"ACM Transactions on Computer Systems\",\"volume\":\"27 1\",\"pages\":\"2:1-2:32\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2016-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computer Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2870631\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2870631","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 24

摘要

随着用户对苹果的Siri、谷歌的Google Now和微软的Cortana等智能个人助理(IPAs)的需求不断扩大，我们正在接近当前数据中心(DC)架构的计算极限。未来的服务器架构应该如何发展以支持这类新兴的应用程序，这是一个悬而未决的问题，而缺乏开源的IPA工作负载是解决这个问题的一个障碍。在本文中，我们介绍了Sirius的设计，这是一个开放的端到端IPA web服务应用程序，它接受语音和图像形式的查询，并使用自然语言进行响应。然后，我们使用此工作负载来研究未来基于加速器的服务器架构(跨越传统cpu、gpu、多核吞吐量协处理器和fpga)在设计空间中的四个要点的含义。为了研究天狼星未来的服务器设计，我们将天狼星分解为一组八个基准测试(天狼星套件)，其中包括天狼星的计算密集型瓶颈。我们将Sirius Suite移植到一系列加速器平台，并使用这些平台之间的性能和功耗权衡来执行各种服务器设计点的总拥有成本(TCO)分析。在我们的研究中，我们发现加速器对IPA服务的未来可扩展性至关重要。我们的研究结果表明，GPU和fpga加速服务器的查询延迟平均分别提高了8.5倍和15倍。对于给定的吞吐量，GPU和fpga加速的服务器可以将数据中心的TCO分别降低2.3倍和1.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant

As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computer Systems 工程技术-计算机：理论方法

CiteScore

4.00

自引率

0.00%

发文量

审稿时长

1 months

期刊介绍： ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized. TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.