CostPI

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337879

Jiahao Liu, Fang Wang, D. Feng

{"title":"CostPI","authors":"Jiahao Liu, Fang Wang, D. Feng","doi":"10.1145/3337821.3337879","DOIUrl":null,"url":null,"abstract":"NVMe SSDs have been wildly adopted to provide storage services in cloud platforms where diverse workloads (including latency-sensitive, throughput-oriented and capacity-oriented workloads) are colocated. To achieve performance isolation, existing solutions partition the shared SSD into multiple isolated regions and assign each workload a separate region. However, these isolation solutions could result in inefficient resource utilization and imbalanced wear. More importantly, they cannot reduce the interference caused by embedded cache contention. In this paper, we present CostPI to improve isolation and resource utilization by providing latency-sensitive workloads with dedicated resources (including data cache, mapping table cache and NAND flash), and providing throughput-oriented and capacity-oriented workloads with shared resources. Specifically, at the NVMe queue level, we present an SLO-aware arbitration mechanism which fetches requests from NVMe queues at different granularities according to workload SLOs. At the embedded cache level, we use an asymmetric allocation scheme to partition the cache (including data cache and mapping table cache). For different data cache partitions, we adopt different cache polices to meet diverse workload requirements while reducing the imbalanced wear. At the NAND flash level, we partition the hardware resources at the channel granularity to enable the strongest isolation. Our experiments show that CostPI can reduce the average response time by up to 44.2%, the 99% response time by up to 89.5%, and the 99.9% by up to 88.5% for latency-sensitive workloads. Meanwhile, CostPI can increase resource utilization and reduce wear-imbalance for the shared NVMe SSD.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

NVMe SSDs have been wildly adopted to provide storage services in cloud platforms where diverse workloads (including latency-sensitive, throughput-oriented and capacity-oriented workloads) are colocated. To achieve performance isolation, existing solutions partition the shared SSD into multiple isolated regions and assign each workload a separate region. However, these isolation solutions could result in inefficient resource utilization and imbalanced wear. More importantly, they cannot reduce the interference caused by embedded cache contention. In this paper, we present CostPI to improve isolation and resource utilization by providing latency-sensitive workloads with dedicated resources (including data cache, mapping table cache and NAND flash), and providing throughput-oriented and capacity-oriented workloads with shared resources. Specifically, at the NVMe queue level, we present an SLO-aware arbitration mechanism which fetches requests from NVMe queues at different granularities according to workload SLOs. At the embedded cache level, we use an asymmetric allocation scheme to partition the cache (including data cache and mapping table cache). For different data cache partitions, we adopt different cache polices to meet diverse workload requirements while reducing the imbalanced wear. At the NAND flash level, we partition the hardware resources at the channel granularity to enable the strongest isolation. Our experiments show that CostPI can reduce the average response time by up to 44.2%, the 99% response time by up to 89.5%, and the 99.9% by up to 88.5% for latency-sensitive workloads. Meanwhile, CostPI can increase resource utilization and reduce wear-imbalance for the shared NVMe SSD.

查看原文