Pub Date : 2024-04-26DOI: 10.1109/TCC.2024.3393895
Dumitrel Loghin
Almost all major cloud providers offer virtual machines running on servers with 64-bit ARM CPUs. For example, Amazon Web Services (AWS) designed custom ARM-based CPUs named Graviton2 and Graviton3. Other cloud providers, such as Microsoft Azure and Google Cloud Platform (GCP), employ servers with Ampere Altra CPUs. In this context, we conduct a comprehensive experimental study covering in-memory key-value stores, relational databases, enterprise blockchains, and Machine Learning inference. We cover all the available types of ARM cloud processors, including Graviton2 (AWS), Graviton3 (AWS), Ampere Altra (Azure and GCP), Yitian 710 (Alibaba Cloud), and Kunpeng 920 (Huawei Cloud). Our analysis shows that Yitian and Graviton3 are serious competitors for servers with Intel Xeon CPUs, achieving similar or better results with in-memory workloads. However, the performance of OLAP, ML inference, and blockchain on ARM-based servers is below that of Xeon. The reasons are mainly threefold 1) un-optimized software, 2) lower clock frequency, and 3) lower performance at core level. Surprisingly, ARM servers spend 2X more time in Linux kernel system calls compared to Xeon servers. Nonetheless, ARM-based servers show great potential. Given their lower cloud computing price, ARM servers could be the ideal choice when the performance is not critical.
几乎所有主要的云计算提供商都提供在配备 64 位 ARM CPU 的服务器上运行的虚拟机。例如,亚马逊网络服务(AWS)设计了基于 ARM 的定制 CPU,命名为 Graviton2 和 Graviton3。其他云提供商,如微软 Azure 和谷歌云平台(GCP),则采用了配备 Ampere Altra CPU 的服务器。在此背景下,我们进行了一项全面的实验研究,涵盖内存键值存储、关系数据库、企业区块链和机器学习推理。我们涵盖了所有可用的 ARM 云处理器类型,包括 Graviton2(AWS)、Graviton3(AWS)、Ampere Altra(Azure 和 GCP)、倚天 710(阿里巴巴云)和鲲鹏 920(华为云)。我们的分析表明,倚天和 Graviton3 是英特尔至强 CPU 服务器的有力竞争者,在内存工作负载方面取得了相似或更好的结果。然而,在基于 ARM 的服务器上,OLAP、ML 推理和区块链的性能却低于至强。原因主要有三个方面:1)软件未优化;2)时钟频率较低;3)内核级性能较低。令人惊讶的是,与 Xeon 服务器相比,ARM 服务器在 Linux 内核系统调用上花费的时间多出 2 倍。不过,基于 ARM 的服务器显示出巨大的潜力。鉴于其较低的云计算价格,ARM 服务器可能是性能要求不高时的理想选择。
{"title":"Are ARM Cloud Servers Ready for Database Workloads? an Experimental Study","authors":"Dumitrel Loghin","doi":"10.1109/TCC.2024.3393895","DOIUrl":"10.1109/TCC.2024.3393895","url":null,"abstract":"Almost all major cloud providers offer virtual machines running on servers with 64-bit ARM CPUs. For example, Amazon Web Services (AWS) designed custom ARM-based CPUs named Graviton2 and Graviton3. Other cloud providers, such as Microsoft Azure and Google Cloud Platform (GCP), employ servers with Ampere Altra CPUs. In this context, we conduct a comprehensive experimental study covering in-memory key-value stores, relational databases, enterprise blockchains, and Machine Learning inference. We cover all the available types of ARM cloud processors, including Graviton2 (AWS), Graviton3 (AWS), Ampere Altra (Azure and GCP), Yitian 710 (Alibaba Cloud), and Kunpeng 920 (Huawei Cloud). Our analysis shows that Yitian and Graviton3 are serious competitors for servers with Intel Xeon CPUs, achieving similar or better results with in-memory workloads. However, the performance of OLAP, ML inference, and blockchain on ARM-based servers is below that of Xeon. The reasons are mainly threefold 1) un-optimized software, 2) lower clock frequency, and 3) lower performance at core level. Surprisingly, ARM servers spend 2X more time in Linux kernel system calls compared to Xeon servers. Nonetheless, ARM-based servers show great potential. Given their lower cloud computing price, ARM servers could be the ideal choice when the performance is not critical.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 3","pages":"818-829"},"PeriodicalIF":5.3,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Workload characterization and subsequent prediction are significant steps in maintaining the elasticity and scalability of resources in Cloud Data Centers. Due to the high variance in cloud workloads, designing a prediction algorithm that models the variations in the workload is a non-trivial task. If the workload predictor is unable to handle the dynamism in the workloads, then the result of the predictor may lead to over-provisioning or under-provisioning of cloud resources. To address this problem, we have created a Super Markov Prediction Model (SMPM) whose behaviour changes as per the change in the workload patterns. As the time progresses, based on the workload pattern SMPM uses different sequence models to predict the future workload. To evaluate the proposed model, we have experimented with Alibaba trace 2018, Google Cluster Trace (GCT), Alibaba trace 2020 and TPC-W workload trace. We have compared SMPM's prediction results with existing state-of-the-art prediction models and empirically verified that the proposed prediction model achieves a better accuracy as quantified using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).
{"title":"Design and Evaluation of a Hierarchical Characterization and Adaptive Prediction Model for Cloud Workloads","authors":"Karthick Seshadri;Korrapati Sindhu;S. Nagesh Bhattu;Chidambaran Kollengode","doi":"10.1109/TCC.2024.3393114","DOIUrl":"10.1109/TCC.2024.3393114","url":null,"abstract":"Workload characterization and subsequent prediction are significant steps in maintaining the elasticity and scalability of resources in Cloud Data Centers. Due to the high variance in cloud workloads, designing a prediction algorithm that models the variations in the workload is a non-trivial task. If the workload predictor is unable to handle the dynamism in the workloads, then the result of the predictor may lead to over-provisioning or under-provisioning of cloud resources. To address this problem, we have created a Super Markov Prediction Model (SMPM) whose behaviour changes as per the change in the workload patterns. As the time progresses, based on the workload pattern SMPM uses different sequence models to predict the future workload. To evaluate the proposed model, we have experimented with Alibaba trace 2018, Google Cluster Trace (GCT), Alibaba trace 2020 and TPC-W workload trace. We have compared SMPM's prediction results with existing state-of-the-art prediction models and empirically verified that the proposed prediction model achieves a better accuracy as quantified using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"712-724"},"PeriodicalIF":6.5,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-19DOI: 10.1109/TCC.2024.3391390
Kevin Nguetchouang;Stella Bitchebe;Theophile Dubuc;Mar Callau-Zori;Christophe Hubert;Pierre Olivier;Alain Tchana
Contrary to CPU, memory, and network, disk virtualization is peculiar, for which virtualization through direct access is impossible. We study virtual disk utilization in a large-scale public cloud and observe the presence of long snapshot chains, sometimes composed of up to 1,000 files. We then demonstrate, through experimental measurements, that such long chains lead to virtualized storage performance and memory footprint scalability issues. To address these problems, we present SVD