Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00013
C. Künas, M. Serpa, J. L. Bez, E. Padoin, P. Navaux
I/O operations are a bottleneck for numerous applications, so optimizing the performance of these operations is of paramount importance. Many techniques explore and apply optimizations to different layers of the I/O stack to improve performance. The difficulty that arises is that the workload changes constantly. So detecting access patterns correctly, at runtime, becomes essential for systems that seek to self-adjust their parameters. Furthermore, the I/O pattern detection techniques should represent minimal overhead and should be able to perform detection as quickly as possible. This paper approaches a machine learning technique for detecting the I/O access patterns and proposes offloading the local training workload to the cloud using a TPU accelerator. Such an approach does not interfere with classifier accuracy (reaching up to 99% accuracy). Still, it allows the training to be asynchronous, enabling the local machine to allocate its computing resources to scientific applications while the model is trained or updated in the cloud.
{"title":"Offloading the Training of an I/O Access Pattern Detector to the Cloud","authors":"C. Künas, M. Serpa, J. L. Bez, E. Padoin, P. Navaux","doi":"10.1109/sbac-padw53941.2021.00013","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00013","url":null,"abstract":"I/O operations are a bottleneck for numerous applications, so optimizing the performance of these operations is of paramount importance. Many techniques explore and apply optimizations to different layers of the I/O stack to improve performance. The difficulty that arises is that the workload changes constantly. So detecting access patterns correctly, at runtime, becomes essential for systems that seek to self-adjust their parameters. Furthermore, the I/O pattern detection techniques should represent minimal overhead and should be able to perform detection as quickly as possible. This paper approaches a machine learning technique for detecting the I/O access patterns and proposes offloading the local training workload to the cloud using a TPU accelerator. Such an approach does not interfere with classifier accuracy (reaching up to 99% accuracy). Still, it allows the training to be asynchronous, enabling the local machine to allocate its computing resources to scientific applications while the model is trained or updated in the cloud.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130411305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00017
William F. C. Tavares, M. M. Assis, E. Borin
Among many details that users need to consider when using cloud computing, the care not to waste resources requires more attention by administrators and new users. When the application does not fully utilize the provisioned resource, the end-of-the-month bill is unnecessarily increased. Several studies have developed solutions to avoid wastage using predictive techniques. Nonetheless, these approaches require applications’ to have predictive behavior and depend on pre-executions or history data. To circumvent these limitations, we explore how a reactive solution can be used to detect and contain wastage. More specifically, we discuss several important issues that arise when quantifying resource wastage caused by HPC resource wastage on the cloud and propose a reactive strategy to quantify, detect, and contain resource wastage in this context. This solution is designed so that it can be applied in environments with expert and non-expert users with no prior knowledge about the applications.
{"title":"Quantifying and detecting HPC resource wastage in cloud environments","authors":"William F. C. Tavares, M. M. Assis, E. Borin","doi":"10.1109/sbac-padw53941.2021.00017","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00017","url":null,"abstract":"Among many details that users need to consider when using cloud computing, the care not to waste resources requires more attention by administrators and new users. When the application does not fully utilize the provisioned resource, the end-of-the-month bill is unnecessarily increased. Several studies have developed solutions to avoid wastage using predictive techniques. Nonetheless, these approaches require applications’ to have predictive behavior and depend on pre-executions or history data. To circumvent these limitations, we explore how a reactive solution can be used to detect and contain wastage. More specifically, we discuss several important issues that arise when quantifying resource wastage caused by HPC resource wastage on the cloud and propose a reactive strategy to quantify, detect, and contain resource wastage in this context. This solution is designed so that it can be applied in environments with expert and non-expert users with no prior knowledge about the applications.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132110534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00015
O. Napoli, Gustavo Ciotto Pinton, E. Borin
The computational cloud has become notorious due to its business model, where the user only pays to use the system, with no acquisition or maintenance costs. However, cloud providers such as AWS EC2 and Google Computing Engine offer several virtual machine types making it difficult to choose which of them is most suitable to the user’s application and objective.In this work, we present CLAP-Bot, a system that automatically monitors and adjusts the computing infrastructure based on some recipe. CLAP-Bot is built over CLAP, allowing creating and managing computational clusters in different cloud providers. The recipe is a component that can read application metrics and execute a set of actions on the infrastructure. The application monitor is decoupled from the recipe, allowing it to be used transparently with different applications. We show how CLAP-Bot works by implementing three dynamic provisioning policies as recipes and evaluating them. Besides that, together with CLAP-Bot we also present CLAP-Bot-Sim, a discrete event simulator that allows modeling the use of a given recipe without the need to instantiate any virtual machine. CLAP-Bot-Sim also allows modeling dynamic events, such as virtual machine interruptions and instance price oscillation over time. We show that CLAP-Bot-Sim can accurately simulate the effects of recipes on the computing infrastructure and can easily be interchanged with CLAP-Bot.
{"title":"CLAP-Bot: a framework for automatic optimization of high-performance elastic applications on the Clouds","authors":"O. Napoli, Gustavo Ciotto Pinton, E. Borin","doi":"10.1109/sbac-padw53941.2021.00015","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00015","url":null,"abstract":"The computational cloud has become notorious due to its business model, where the user only pays to use the system, with no acquisition or maintenance costs. However, cloud providers such as AWS EC2 and Google Computing Engine offer several virtual machine types making it difficult to choose which of them is most suitable to the user’s application and objective.In this work, we present CLAP-Bot, a system that automatically monitors and adjusts the computing infrastructure based on some recipe. CLAP-Bot is built over CLAP, allowing creating and managing computational clusters in different cloud providers. The recipe is a component that can read application metrics and execute a set of actions on the infrastructure. The application monitor is decoupled from the recipe, allowing it to be used transparently with different applications. We show how CLAP-Bot works by implementing three dynamic provisioning policies as recipes and evaluating them. Besides that, together with CLAP-Bot we also present CLAP-Bot-Sim, a discrete event simulator that allows modeling the use of a given recipe without the need to instantiate any virtual machine. CLAP-Bot-Sim also allows modeling dynamic events, such as virtual machine interruptions and instance price oscillation over time. We show that CLAP-Bot-Sim can accurately simulate the effects of recipes on the computing infrastructure and can easily be interchanged with CLAP-Bot.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126881927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00009
{"title":"WCC Program Committee","authors":"","doi":"10.1109/sbac-padw53941.2021.00009","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00009","url":null,"abstract":"","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131440401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/SBAC-PADW53941.2021.00014
Rafael Keller Tesser, Alvaro Marques, E. Borin
The cloud has become a popular environment for running Deep Learning (DL) applications. Public cloud providers charge by the amount time the resources are actually used, with the price by hour depending on the configuration of the chosen cloud instance. Instances are usually provided in the form of a VM that gives access to a certain hardware configuration, and may also come with a pre-configured software environment. More advanced, and theoretically faster, VMs are usually more expensive, but may not necessarily provide the best performance for all applications. Therefore, in order to choose the best instance (or VM type), users must consider the relative performances (and consequent cost) of different VMs when running their specific target application. Taking this into account, we propose a model to estimate the relative performance and cost of training deep learning applications running in different VM instances. This model is built upon observations derived from the performance profile of executions of three different DL applications, on 12 different public cloud instances. We argue that this model is a valuable tool for cloud users looking for optimal VM types to train their deep learning applications on the cloud.
{"title":"Selecting efficient VM types to train deep learning models on Amazon SageMaker","authors":"Rafael Keller Tesser, Alvaro Marques, E. Borin","doi":"10.1109/SBAC-PADW53941.2021.00014","DOIUrl":"https://doi.org/10.1109/SBAC-PADW53941.2021.00014","url":null,"abstract":"The cloud has become a popular environment for running Deep Learning (DL) applications. Public cloud providers charge by the amount time the resources are actually used, with the price by hour depending on the configuration of the chosen cloud instance. Instances are usually provided in the form of a VM that gives access to a certain hardware configuration, and may also come with a pre-configured software environment. More advanced, and theoretically faster, VMs are usually more expensive, but may not necessarily provide the best performance for all applications. Therefore, in order to choose the best instance (or VM type), users must consider the relative performances (and consequent cost) of different VMs when running their specific target application. Taking this into account, we propose a model to estimate the relative performance and cost of training deep learning applications running in different VM instances. This model is built upon observations derived from the performance profile of executions of three different DL applications, on 12 different public cloud instances. We argue that this model is a valuable tool for cloud users looking for optimal VM types to train their deep learning applications on the cloud.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132682223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00008
{"title":"Message from the WCC 2021 Workshop Chairs","authors":"","doi":"10.1109/sbac-padw53941.2021.00008","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00008","url":null,"abstract":"","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123886278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00012
Lucas B. da Silva, J. F. Lima
The constant growth of social media, unconventional web technologies, mobile applications, and Internet of Things (IoT) devices, create challenges for cloud data systems in order to support huge datasets and very high request rates. NoSQL distributed databases such as Cassandra have been used for unstructured data storage and to increase horizontal scalability and high availability. In this paper, we evaluated Cassandra on a low-power low-cost cluster of commodity Single Board Computers (SBC). The cluster has 15 Raspberry Pi v3 nodes with Docker Swarm orchestration tool for Cassandra service deployment and ingress load balancing over SBCs. Experimental results demonstrated that hardware limitations impacted workload throughput, but read and write latencies were comparable to results from other works on high-end or virtualized platforms. Despite the observed limitations, the results show that a low-cost SBC cluster can support cloud serving goals such as scale-out, elasticity and high availability.
社交媒体、非常规网络技术、移动应用程序和物联网(IoT)设备的不断发展,为云数据系统支持庞大的数据集和非常高的请求率带来了挑战。NoSQL分布式数据库(如Cassandra)已被用于非结构化数据存储,并增加水平可伸缩性和高可用性。在本文中,我们在低功耗低成本的商用单板计算机(SBC)集群上对Cassandra进行了评估。集群有15个Raspberry Pi v3节点,使用Docker Swarm编排工具,用于Cassandra服务部署和sbc的入口负载均衡。实验结果表明,硬件限制会影响工作负载吞吐量,但读写延迟与高端平台或虚拟化平台上的其他工作的结果相当。尽管存在观察到的限制,但结果表明,低成本SBC集群可以支持云服务目标,如横向扩展、弹性和高可用性。
{"title":"An evaluation of Cassandra NoSQL database on a low-power cluster","authors":"Lucas B. da Silva, J. F. Lima","doi":"10.1109/sbac-padw53941.2021.00012","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00012","url":null,"abstract":"The constant growth of social media, unconventional web technologies, mobile applications, and Internet of Things (IoT) devices, create challenges for cloud data systems in order to support huge datasets and very high request rates. NoSQL distributed databases such as Cassandra have been used for unstructured data storage and to increase horizontal scalability and high availability. In this paper, we evaluated Cassandra on a low-power low-cost cluster of commodity Single Board Computers (SBC). The cluster has 15 Raspberry Pi v3 nodes with Docker Swarm orchestration tool for Cassandra service deployment and ingress load balancing over SBCs. Experimental results demonstrated that hardware limitations impacted workload throughput, but read and write latencies were comparable to results from other works on high-end or virtualized platforms. Despite the observed limitations, the results show that a low-cost SBC cluster can support cloud serving goals such as scale-out, elasticity and high availability.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115131168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00007
{"title":"WAMCA 2021 Program Committee","authors":"","doi":"10.1109/sbac-padw53941.2021.00007","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00007","url":null,"abstract":"","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122298066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00011
Rafael Gauna Trindade, J. F. Lima, A. Charão
Understanding the underlying architecture is essential for scientific applications in general. An example of a computing environment is Non-Uniform Memory Access (NUMA) systems that enable a large amount of shared main memory. Nevertheless, NUMA systems can impose significant access latencies on data communications between distant memory nodes. Parallel applications with a naïve design may suffer significant performance penalties due to the lack of locality mechanisms. In this paper we present performance metrics on scientific applications to identify locality problems in NUMA systems and show data and thread mapping strategies to mitigate them. Our experiments were performed with four well-known scientific applications: CoMD, LBM, LULESH and Ondes3D. Experimental results demonstrate that scientific applications had significant locality problems and data and thread mapping strategies improved performance on all four applications.
{"title":"A Memory Affinity Analysis of Scientific Applications on NUMA Platforms","authors":"Rafael Gauna Trindade, J. F. Lima, A. Charão","doi":"10.1109/sbac-padw53941.2021.00011","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00011","url":null,"abstract":"Understanding the underlying architecture is essential for scientific applications in general. An example of a computing environment is Non-Uniform Memory Access (NUMA) systems that enable a large amount of shared main memory. Nevertheless, NUMA systems can impose significant access latencies on data communications between distant memory nodes. Parallel applications with a naïve design may suffer significant performance penalties due to the lack of locality mechanisms. In this paper we present performance metrics on scientific applications to identify locality problems in NUMA systems and show data and thread mapping strategies to mitigate them. Our experiments were performed with four well-known scientific applications: CoMD, LBM, LULESH and Ondes3D. Experimental results demonstrate that scientific applications had significant locality problems and data and thread mapping strategies improved performance on all four applications.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129509744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-01DOI: 10.1109/sbac-padw53941.2021.00018
Raoni Matos Smaneoto, T. Pereira, F. Brasileiro
With the increased amount of data available for processing, and the increased need of processing this data, loosely-coupled batch applications have become very popular. Many batch applications require a high level of processing capacity, which leads us to the need of high performance computing infrastructures. This approach has been used for a long time, mainly for scientific purposes, and focused on the conventional environments for HPC, namely local clusters and supercomputers. The high-speed networks present in these systems are paramount for the execution of tightly-coupled scientific applications, but are wasted when executing loosely-coupled applications. Cloud infrastructures, on the other hand, provide a more appropriate infrastructure to support such loosely-coupled applications. Unfortunately, the user experience in cloud systems is completely different from that of conventional batch systems, mainly because the infrastructure needs to be deployed and subsequently released, to achieve the desired gains. In this paper we propose the architecture of a batch processing system that takes advantage of common features of cloud infrastructures to minimize cost and waiting time, while providing a user experience that is similar to conventional HPC systems.
{"title":"A Cloud-Based Batch Processing System for Loosely-Coupled Applications","authors":"Raoni Matos Smaneoto, T. Pereira, F. Brasileiro","doi":"10.1109/sbac-padw53941.2021.00018","DOIUrl":"https://doi.org/10.1109/sbac-padw53941.2021.00018","url":null,"abstract":"With the increased amount of data available for processing, and the increased need of processing this data, loosely-coupled batch applications have become very popular. Many batch applications require a high level of processing capacity, which leads us to the need of high performance computing infrastructures. This approach has been used for a long time, mainly for scientific purposes, and focused on the conventional environments for HPC, namely local clusters and supercomputers. The high-speed networks present in these systems are paramount for the execution of tightly-coupled scientific applications, but are wasted when executing loosely-coupled applications. Cloud infrastructures, on the other hand, provide a more appropriate infrastructure to support such loosely-coupled applications. Unfortunately, the user experience in cloud systems is completely different from that of conventional batch systems, mainly because the infrastructure needs to be deployed and subsequently released, to achieve the desired gains. In this paper we propose the architecture of a batch processing system that takes advantage of common features of cloud infrastructures to minimize cost and waiting time, while providing a user experience that is similar to conventional HPC systems.","PeriodicalId":233108,"journal":{"name":"2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132849044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}