Amirhossein Mirhosseini, Akshitha Sriraman, T. Wenisch
{"title":"Enhancing Server Efficiency in the Face of Killer Microseconds","authors":"Amirhossein Mirhosseini, Akshitha Sriraman, T. Wenisch","doi":"10.1109/HPCA.2019.00037","DOIUrl":null,"url":null,"abstract":"We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads’ state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice’s QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34
Abstract
We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads’ state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice’s QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.