{"title":"Statistical Tail-Latency Bounded QoS Provisioning for Parallel and Distributed Data Centers","authors":"Xi Zhang, Qixuan Zhu","doi":"10.1109/ICDCS51616.2021.00078","DOIUrl":null,"url":null,"abstract":"The large-scale interactive services distribute clients' requests across a large number of physical machine in data center architectures to enhance the quality-of-service (QoS) performance. In parallel and distributed data center architecture, even a temporary spike in latency of any service component can significantly impact the end-to-end delay. Besides the average latency, tail-latency (i.e., worst case latency) of a service has also attracted a lot of research attentions. The tail-latency is a critical performance metric in data centers, where long tail latencies refer to the higher percentiles (such as 98th, 99th) of latency in comparison to the average latency time. While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically bounding the tail-latency for data center networks has neither been well understood nor thoroughly studied. In this paper, we model and characterize the tail-latency distribution in a three-layer parallel and distributed data center architecture, where clients request different types of services and ten download their requested data packets from data center through a first-come-first-serve M/M/1 queueing system. We first define the statistical tail-latency bounded QoS, and investigate the tail-latency problem through generalized extreme value (GEV) theory and generalized Pareto distribution (GPD) theory. Then, we propose a scheme to identify the dominant sources of latency variance in a semantic context, so that we are able to optimize the instructions of those sources to reduce the latency tail. Finally, using numerical analyses we validate and evaluate our developed modeling techniques and schemes for characterizing the tail-latency QoS provisioning theories in supporting data center networks.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The large-scale interactive services distribute clients' requests across a large number of physical machine in data center architectures to enhance the quality-of-service (QoS) performance. In parallel and distributed data center architecture, even a temporary spike in latency of any service component can significantly impact the end-to-end delay. Besides the average latency, tail-latency (i.e., worst case latency) of a service has also attracted a lot of research attentions. The tail-latency is a critical performance metric in data centers, where long tail latencies refer to the higher percentiles (such as 98th, 99th) of latency in comparison to the average latency time. While the statistical delay-bounded QoS provisioning theory has been shown to be a powerful technique and useful performance metric for supporting time-sensitive multimedia transmissions over mobile computing networks, how to efficiently extend and implement this technique/performance-metric for statistically bounding the tail-latency for data center networks has neither been well understood nor thoroughly studied. In this paper, we model and characterize the tail-latency distribution in a three-layer parallel and distributed data center architecture, where clients request different types of services and ten download their requested data packets from data center through a first-come-first-serve M/M/1 queueing system. We first define the statistical tail-latency bounded QoS, and investigate the tail-latency problem through generalized extreme value (GEV) theory and generalized Pareto distribution (GPD) theory. Then, we propose a scheme to identify the dominant sources of latency variance in a semantic context, so that we are able to optimize the instructions of those sources to reduce the latency tail. Finally, using numerical analyses we validate and evaluate our developed modeling techniques and schemes for characterizing the tail-latency QoS provisioning theories in supporting data center networks.