{"title":"Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage Systems with Minimal Measurement Data","authors":"Moo-Ryong Ra, H. Lee","doi":"10.1109/MSST.2019.00-21","DOIUrl":null,"url":null,"abstract":"Constructing an accurate performance model for distributed storage systems has been identified as a very difficult problem. Researchers in this area either come up with an involved mathematical model specifically tailored to a target storage system or treat each storage system as a black box and apply machine learning techniques to predict the performance. Both approaches involve a significant amount of efforts and data collection processes, which often take a prohibited amount of time to apply to real world scenarios. In this paper, we propose a simple, yet accurate, performance estimation technique for scalable distributed storage systems. We claim that the total processing capability per IO size is conserved across a different mix of read/write ratios and IO sizes. Based on the hypothesis, we construct a performance model which can be used to estimate the performance of an arbitrarily mixed IO workload. The proposed technique requires only a couple of measurement points per IO size in order to provide accurate performance estimation. Our preliminary results are very promising. Based on two widely-used distributed storage systems (i.e., Ceph and Swift) under a different cluster configuration, we show that the total processing capability per IO size indeed remains constant. As a result, our technique was able to provide accurate prediction results.","PeriodicalId":391517,"journal":{"name":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2019.00-21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Constructing an accurate performance model for distributed storage systems has been identified as a very difficult problem. Researchers in this area either come up with an involved mathematical model specifically tailored to a target storage system or treat each storage system as a black box and apply machine learning techniques to predict the performance. Both approaches involve a significant amount of efforts and data collection processes, which often take a prohibited amount of time to apply to real world scenarios. In this paper, we propose a simple, yet accurate, performance estimation technique for scalable distributed storage systems. We claim that the total processing capability per IO size is conserved across a different mix of read/write ratios and IO sizes. Based on the hypothesis, we construct a performance model which can be used to estimate the performance of an arbitrarily mixed IO workload. The proposed technique requires only a couple of measurement points per IO size in order to provide accurate performance estimation. Our preliminary results are very promising. Based on two widely-used distributed storage systems (i.e., Ceph and Swift) under a different cluster configuration, we show that the total processing capability per IO size indeed remains constant. As a result, our technique was able to provide accurate prediction results.