Sunimal Rathnayake, Lavanya Ramapantulu, Y. M. Teo
{"title":"Characterizing the Cost-Accuracy Performance of Cloud Applications","authors":"Sunimal Rathnayake, Lavanya Ramapantulu, Y. M. Teo","doi":"10.1145/3409390.3409409","DOIUrl":null,"url":null,"abstract":"Emergence of applications that produce results with different accuracy allows cloud consumers to leverage the advantages of elastic cloud resources and pay-per-use pricing model. However, the trade-off between cost, accuracy and execution time of cloud applications has not been well studied due to multiple challenges. A key challenge faced by a cloud consumer is tuning the application and determining cloud resource configuration that achieves the desired application accuracy among the configuration space. This paper proposes an approach to improve the cost-accuracy performance of cloud applications for a given cost and accuracy. To illustrate our approach, we use two popular convolution neural networks’ (CNN) inference as examples with pruning as a tuning tool for changing the accuracy, and yield several insights. Firstly, we show the existence of multiple degrees of pruning as “sweet-spots”, where inference time and cost can be reduced without losing accuracy. Combining such sweet-spots can halve inference cost and time with one-tenth reduction in accuracy for Caffenet CNN. Secondly, we show that in the large resource configuration space, these “sweet-spots” form the cost-accuracy and time-accuracy Pareto-frontiers whereby a Pareto-optimal configuration can reduce cost and execution time by 55% and 50% respectively for achieving the highest possible inference accuracy. Lastly, to quantify the accuracy performance of cloud applications, we introduce Time Accuracy Ratio (TAR) and Cost Accuracy Ratio (CAR) metrics. We show that using TAR and CAR reduces the time complexity in determining cloud resource configurations from exponential to polynomial-time.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Emergence of applications that produce results with different accuracy allows cloud consumers to leverage the advantages of elastic cloud resources and pay-per-use pricing model. However, the trade-off between cost, accuracy and execution time of cloud applications has not been well studied due to multiple challenges. A key challenge faced by a cloud consumer is tuning the application and determining cloud resource configuration that achieves the desired application accuracy among the configuration space. This paper proposes an approach to improve the cost-accuracy performance of cloud applications for a given cost and accuracy. To illustrate our approach, we use two popular convolution neural networks’ (CNN) inference as examples with pruning as a tuning tool for changing the accuracy, and yield several insights. Firstly, we show the existence of multiple degrees of pruning as “sweet-spots”, where inference time and cost can be reduced without losing accuracy. Combining such sweet-spots can halve inference cost and time with one-tenth reduction in accuracy for Caffenet CNN. Secondly, we show that in the large resource configuration space, these “sweet-spots” form the cost-accuracy and time-accuracy Pareto-frontiers whereby a Pareto-optimal configuration can reduce cost and execution time by 55% and 50% respectively for achieving the highest possible inference accuracy. Lastly, to quantify the accuracy performance of cloud applications, we introduce Time Accuracy Ratio (TAR) and Cost Accuracy Ratio (CAR) metrics. We show that using TAR and CAR reduces the time complexity in determining cloud resource configurations from exponential to polynomial-time.