Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars
{"title":"为天狼星设计未来的仓库级计算机,一个端到端的语音和视觉个人助理","authors":"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars","doi":"10.1145/2870631","DOIUrl":null,"url":null,"abstract":"As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"27 1","pages":"2:1-2:32"},"PeriodicalIF":2.0000,"publicationDate":"2016-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant\",\"authors\":\"Johann Hauswald, M. Laurenzano, Yunqi Zhang, Hailong Yang, Yiping Kang, Cheng Li, A. Rovinski, Arjun Khurana, R. Dreslinski, T. Mudge, V. Petrucci, Lingjia Tang, Jason Mars\",\"doi\":\"10.1145/2870631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.\",\"PeriodicalId\":50918,\"journal\":{\"name\":\"ACM Transactions on Computer Systems\",\"volume\":\"27 1\",\"pages\":\"2:1-2:32\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2016-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computer Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/2870631\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2870631","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Designing Future Warehouse-Scale Computers for Sirius, an End-to-End Voice and Vision Personal Assistant
As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current datacenter (DC) architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this article, we present the design of Sirius, an open end-to-end IPA Web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of eight benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 8.5× and 15×, respectively. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of DCs by 2.3× and 1.3×, respectively.
期刊介绍:
ACM Transactions on Computer Systems (TOCS) presents research and development results on the design, implementation, analysis, evaluation, and use of computer systems and systems software. The term "computer systems" is interpreted broadly and includes operating systems, systems architecture and hardware, distributed systems, optimizing compilers, and the interaction between systems and computer networks. Articles appearing in TOCS will tend either to present new techniques and concepts, or to report on experiences and experiments with actual systems. Insights useful to system designers, builders, and users will be emphasized.
TOCS publishes research and technical papers, both short and long. It includes technical correspondence to permit commentary on technical topics and on previously published papers.