Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285951
Fabiana Rossi, V. Cardellini, F. L. Presti
The microservice architecture structures an application as a collection of loosely coupled and distributed services. Since application workloads usually change over time, the number of replicas per microservice should be accordingly scaled at run-time. The most widely adopted scaling policy relies on statically defined thresholds, expressed in terms of system-oriented metrics. This policy might not be well-suited to scale multi-component and latency-sensitive applications, which express requirements in terms of response time. In this paper, we present a two-layered hierarchical solution for controlling the elasticity of microservice-based applications. The higher-level controller estimates the microservice contribution to the application performance, and informs the lower-level components. The latter accordingly scale the single microservices using a dynamic threshold-based policy. So, we propose MB Threshold and QL Threshold, two policies that employ respectively model-based and model-free reinforcement learning approaches to learn threshold update strategies. These policies can compute different thresholds for the different application components, according to the desired deployment objectives. A wide set of simulation results shows the benefits and flexibility of the proposed solution, emphasizing the advantages of using dynamic thresholds over the most adopted policy that uses static thresholds.
{"title":"Self-adaptive Threshold-based Policy for Microservices Elasticity","authors":"Fabiana Rossi, V. Cardellini, F. L. Presti","doi":"10.1109/MASCOTS50786.2020.9285951","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285951","url":null,"abstract":"The microservice architecture structures an application as a collection of loosely coupled and distributed services. Since application workloads usually change over time, the number of replicas per microservice should be accordingly scaled at run-time. The most widely adopted scaling policy relies on statically defined thresholds, expressed in terms of system-oriented metrics. This policy might not be well-suited to scale multi-component and latency-sensitive applications, which express requirements in terms of response time. In this paper, we present a two-layered hierarchical solution for controlling the elasticity of microservice-based applications. The higher-level controller estimates the microservice contribution to the application performance, and informs the lower-level components. The latter accordingly scale the single microservices using a dynamic threshold-based policy. So, we propose MB Threshold and QL Threshold, two policies that employ respectively model-based and model-free reinforcement learning approaches to learn threshold update strategies. These policies can compute different thresholds for the different application components, according to the desired deployment objectives. A wide set of simulation results shows the benefits and flexibility of the proposed solution, emphasizing the advantages of using dynamic thresholds over the most adopted policy that uses static thresholds.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285941
Harsha Sharma, Wenfei Wu, Bangwen Deng
Symbolic Execution is a commonly used technique in network function (NF) verification, and it helps network operators to find implementation or configuration bugs before the deployment. By studying most existing symbolic execution engine, we realize that they only focus on packet arrival based event logic; we propose that NF modeling language should include time-driven logic to describe the actual NF implementations more accurately and performing complete verification. Thus, we define primitives to express time-driven logic in NF modeling language and develop a symbolic execution engine NF-SE that can verify such logic for NFs for multiple packets. Our prototype of NF-SE and evaluation on multiple example NFs demonstrate its usefulness and correctness.
{"title":"Symbolic Execution for Network Functions with Time-Driven Logic","authors":"Harsha Sharma, Wenfei Wu, Bangwen Deng","doi":"10.1109/MASCOTS50786.2020.9285941","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285941","url":null,"abstract":"Symbolic Execution is a commonly used technique in network function (NF) verification, and it helps network operators to find implementation or configuration bugs before the deployment. By studying most existing symbolic execution engine, we realize that they only focus on packet arrival based event logic; we propose that NF modeling language should include time-driven logic to describe the actual NF implementations more accurately and performing complete verification. Thus, we define primitives to express time-driven logic in NF modeling language and develop a symbolic execution engine NF-SE that can verify such logic for NFs for multiple packets. Our prototype of NF-SE and evaluation on multiple example NFs demonstrate its usefulness and correctness.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130943370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285945
Tatsuhiro Chiba, Takeshi Yoshimura
Achieving fast, scalable, and cost-effective genome analytics is always important to open up a new frontier in biomedical and life science. Genome Analysis Toolkit (GATK), an industry-standard genome analysis tool, improves its scalability and performance by leveraging Spark and HDFS. Spark with HDFS has been a leading analytics platform in a past few years, however, the system cannot exploit full advantage of cloud elasticity in a recent modern cloud. In this paper we investigate performance characteristics of GATK using Spark with HDFS and identify scalability issues. Based on a quantitative analysis, we introduce a new approach to utilize Cloud Object Storage (COS) in GATK instead of HDFS, which can help decoupling compute and storage. We demonstrate how this approach can contribute to the improvement of the entire pipeline performance and cost saving. As a result, we demonstrate GATK with IBM COS can achieve up to 28% faster than GATK with HDFS. We also show that this approach can achieve up to 67 % cost saving in total, which includes the time for data loading and whole pipeline analysis.
实现快速、可扩展且具有成本效益的基因组分析对于开辟生物医学和生命科学的新领域一直很重要。Genome Analysis Toolkit (GATK)是一个行业标准的基因组分析工具,通过利用Spark和HDFS来提高其可扩展性和性能。在过去的几年里,Spark with HDFS一直是领先的分析平台,然而,在最近的现代云环境中,该系统无法充分利用云弹性的优势。在本文中,我们研究了使用Spark和HDFS的GATK的性能特征,并确定了可扩展性问题。在定量分析的基础上,我们引入了一种新的方法,在GATK中利用云对象存储(COS)代替HDFS,可以帮助解耦计算和存储。我们演示了这种方法如何有助于改善整个管道性能并节省成本。因此,我们证明了使用IBM COS的GATK可以比使用HDFS的GATK快28%。我们还表明,这种方法可以节省高达67%的总成本,其中包括数据加载和整个管道分析的时间。
{"title":"Investigating Genome Analysis Pipeline Performance on GATK with Cloud Object Storage","authors":"Tatsuhiro Chiba, Takeshi Yoshimura","doi":"10.1109/MASCOTS50786.2020.9285945","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285945","url":null,"abstract":"Achieving fast, scalable, and cost-effective genome analytics is always important to open up a new frontier in biomedical and life science. Genome Analysis Toolkit (GATK), an industry-standard genome analysis tool, improves its scalability and performance by leveraging Spark and HDFS. Spark with HDFS has been a leading analytics platform in a past few years, however, the system cannot exploit full advantage of cloud elasticity in a recent modern cloud. In this paper we investigate performance characteristics of GATK using Spark with HDFS and identify scalability issues. Based on a quantitative analysis, we introduce a new approach to utilize Cloud Object Storage (COS) in GATK instead of HDFS, which can help decoupling compute and storage. We demonstrate how this approach can contribute to the improvement of the entire pipeline performance and cost saving. As a result, we demonstrate GATK with IBM COS can achieve up to 28% faster than GATK with HDFS. We also show that this approach can achieve up to 67 % cost saving in total, which includes the time for data loading and whole pipeline analysis.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133186951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285942
Jehan-Francois Pâris, T. Schwarz
Merkle grids are a new data organization that replicates the functionality of Merkle trees while reducing their transmission and storage costs by up to 50 percent. All Merkle grids organize the objects whose conformity they monitor in a square array. They add row and column hashes to it such that (a) all row hashes contain the hash of the concatenation of the hashes of all the objects in their respective row and (b) all column hashes contain the hash of the concatenation of the hashes of all the objects in their respective column. In addition, a single signed master hash contains the hash of the concatenation of all row and column hashes. Extended Merkle grids add two auxiliary Merkle trees to speed up searches among both row hashes and column hashes. While both basic and extended Merkle grids perform authentication of all blocks better than Merkle trees, only extended Merkle grids can locate individual non-conforming objects or authenticate a single non-conforming object as fast as Merkle trees.
{"title":"Merkle Hash Grids Instead of Merkle Trees","authors":"Jehan-Francois Pâris, T. Schwarz","doi":"10.1109/MASCOTS50786.2020.9285942","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285942","url":null,"abstract":"Merkle grids are a new data organization that replicates the functionality of Merkle trees while reducing their transmission and storage costs by up to 50 percent. All Merkle grids organize the objects whose conformity they monitor in a square array. They add row and column hashes to it such that (a) all row hashes contain the hash of the concatenation of the hashes of all the objects in their respective row and (b) all column hashes contain the hash of the concatenation of the hashes of all the objects in their respective column. In addition, a single signed master hash contains the hash of the concatenation of all row and column hashes. Extended Merkle grids add two auxiliary Merkle trees to speed up searches among both row hashes and column hashes. While both basic and extended Merkle grids perform authentication of all blocks better than Merkle trees, only extended Merkle grids can locate individual non-conforming objects or authenticate a single non-conforming object as fast as Merkle trees.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"528 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285969
Moiz Arif, M. M. Rafique, Seung-Hwan Lim, Zaki Malik
Heterogeneous datacenters, with a variety of compute, memory, and network resources, are becoming increasingly popular to address the resource requirements of time-sensitive applications. One such application framework is the TensorFlow platform, which has become a platform of choice for running machine learning workloads. The state-of-the-art TensorFlow platform is oblivious to the availability and performance profiles of the underlying datacenter resources and does not incorporate resource requirements of the given workloads for distributed training. This leads to executing the training tasks on busy and resource-constrained worker nodes, which results in a significant increase in the overall training time. In this paper, we address this challenge and propose architectural improvements and new software modules in the default TensorFlow platform to make it aware of the availability and capabilities of the underlying datacenter resources. The proposed Infrastructure-Aware Tensor-Flow efficiently schedules the training tasks on the best possible resources for execution and reduces the overall training time. Our evaluation using the worker nodes with varying availability and performance profiles shows that the proposed enhancements yield up to 54 % reduced training time as compared to the default TensorFlow platform.
{"title":"Infrastructure-Aware TensorFlow for Heterogeneous Datacenters","authors":"Moiz Arif, M. M. Rafique, Seung-Hwan Lim, Zaki Malik","doi":"10.1109/MASCOTS50786.2020.9285969","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285969","url":null,"abstract":"Heterogeneous datacenters, with a variety of compute, memory, and network resources, are becoming increasingly popular to address the resource requirements of time-sensitive applications. One such application framework is the TensorFlow platform, which has become a platform of choice for running machine learning workloads. The state-of-the-art TensorFlow platform is oblivious to the availability and performance profiles of the underlying datacenter resources and does not incorporate resource requirements of the given workloads for distributed training. This leads to executing the training tasks on busy and resource-constrained worker nodes, which results in a significant increase in the overall training time. In this paper, we address this challenge and propose architectural improvements and new software modules in the default TensorFlow platform to make it aware of the availability and capabilities of the underlying datacenter resources. The proposed Infrastructure-Aware Tensor-Flow efficiently schedules the training tasks on the best possible resources for execution and reduces the overall training time. Our evaluation using the worker nodes with varying availability and performance profiles shows that the proposed enhancements yield up to 54 % reduced training time as compared to the default TensorFlow platform.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131256056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285944
Andrea Gulino, Arif Canakoglu, S. Ceri, D. Ardagna
Spark is an in-memory framework for implementing distributed applications of various types. Predicting the execution time of Spark applications is an important but challenging problem that has been tackled in the past few years by several studies; most of them achieving good prediction accuracy on simple applications (e.g. known ML algorithms or SQL-based applications). In this work, we consider complex data-driven workflow applications, in which the execution and data flow can be modeled by Directly Acyclic Graphs (DAGs). Workflows can be made of an arbitrary combination of known tasks, each applying a set of Spark operations to their input data. By adopting a hybrid approach, combining analytical and machine learning (ML) models, trained on small DAGs, we can predict, with good accuracy, the execution time of unseen workflows of higher complexity and size. We validate our approach through an extensive experimentation on real-world complex applications, comparing different ML models and choices of feature sets.
{"title":"Performance Prediction for Data-driven Workflows on Apache Spark","authors":"Andrea Gulino, Arif Canakoglu, S. Ceri, D. Ardagna","doi":"10.1109/MASCOTS50786.2020.9285944","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285944","url":null,"abstract":"Spark is an in-memory framework for implementing distributed applications of various types. Predicting the execution time of Spark applications is an important but challenging problem that has been tackled in the past few years by several studies; most of them achieving good prediction accuracy on simple applications (e.g. known ML algorithms or SQL-based applications). In this work, we consider complex data-driven workflow applications, in which the execution and data flow can be modeled by Directly Acyclic Graphs (DAGs). Workflows can be made of an arbitrary combination of known tasks, each applying a set of Spark operations to their input data. By adopting a hybrid approach, combining analytical and machine learning (ML) models, trained on small DAGs, we can predict, with good accuracy, the execution time of unseen workflows of higher complexity and size. We validate our approach through an extensive experimentation on real-world complex applications, comparing different ML models and choices of feature sets.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133300384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285963
Onkar Patil, F. Mueller, Latchesar Ionkov, Jason Lee, M. Lang
The introduction of NVDIMM memory devices has encouraged the use of DRAM/NVM based hybrid memory systems to increase the memory-per-core ratio in compute nodes and obtain possible energy and cost benefits. However, Non-Volatile Memory (NVM) is slower than DRAM in terms of read/write latency. This difference in performance will adversely affect memory-bound applications. Traditionally, data prefetching at the hardware level has been used to increase the number of cache hits to mitigate performance degradation. However, software (SW) prefetching has not been used effectively to reduce the effects of high memory access latencies. Also, the current cache hierarchy and hardware (HW) prefetching are not optimized for a hybrid memory system. We hypothesize that HW and SW prefetching can complement each other in placing data in caches and the Data Translation Look-aside Buffer (DTLB) prior to their references, and by doing so adaptively, highly varying access latencies in a DRAM/NVM hybrid memory system are taken into account. This work contributes an adaptive SW prefetch method based on the characterization of read/write/unroll prefetch distances for NVM and DRAM. Prefetch performance is characterized via custom benchmarks based on STREAM2 specifications in a multicore MPI runtime environment and compared to the performance of the standard SW prefetch pass in GCC. Furthermore, the effects of HW prefetching on kernels executing on hybrid memory system are evaluated. Experimental results indicate that SW prefetching targeted to populate the DTLB results in up to 26% performance improvement when symbiotically used in conjunction with HW prefetching, as opposed to only HW prefetching. Based on our findings, changes to GCC's prefetch-loop-arrays compiler pass are proposed to take advantage of DTLB prefetching in a hybrid memory system for kernels that are frequently used in HPC applications.
{"title":"Symbiotic HW Cache and SW DTLB Prefetching for DRAM/NVM Hybrid Memory","authors":"Onkar Patil, F. Mueller, Latchesar Ionkov, Jason Lee, M. Lang","doi":"10.1109/MASCOTS50786.2020.9285963","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285963","url":null,"abstract":"The introduction of NVDIMM memory devices has encouraged the use of DRAM/NVM based hybrid memory systems to increase the memory-per-core ratio in compute nodes and obtain possible energy and cost benefits. However, Non-Volatile Memory (NVM) is slower than DRAM in terms of read/write latency. This difference in performance will adversely affect memory-bound applications. Traditionally, data prefetching at the hardware level has been used to increase the number of cache hits to mitigate performance degradation. However, software (SW) prefetching has not been used effectively to reduce the effects of high memory access latencies. Also, the current cache hierarchy and hardware (HW) prefetching are not optimized for a hybrid memory system. We hypothesize that HW and SW prefetching can complement each other in placing data in caches and the Data Translation Look-aside Buffer (DTLB) prior to their references, and by doing so adaptively, highly varying access latencies in a DRAM/NVM hybrid memory system are taken into account. This work contributes an adaptive SW prefetch method based on the characterization of read/write/unroll prefetch distances for NVM and DRAM. Prefetch performance is characterized via custom benchmarks based on STREAM2 specifications in a multicore MPI runtime environment and compared to the performance of the standard SW prefetch pass in GCC. Furthermore, the effects of HW prefetching on kernels executing on hybrid memory system are evaluated. Experimental results indicate that SW prefetching targeted to populate the DTLB results in up to 26% performance improvement when symbiotically used in conjunction with HW prefetching, as opposed to only HW prefetching. Based on our findings, changes to GCC's prefetch-loop-arrays compiler pass are proposed to take advantage of DTLB prefetching in a hybrid memory system for kernels that are frequently used in HPC applications.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"422 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133516062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285970
R. Pletka, N. Papandreou, R. Stoica, H. Pozidis, Nikolas Ioannou, T. Fisher, Aaron Fry, Kip Ingram, Andrew Walls
The continuous growth in 3D-NAND flash storage density has primarily been enabled by 3D stacking and by increasing the number of bits stored per memory cell. Unfortunately, these desirable flash device design choices are adversely affecting reliability and latency characteristics. In particular, increasing the number of bits stored per cell results in having to apply additional voltage thresholds during each read operation, therefore increasing the read latency characteristics. While most NAND flash challenges can be mitigated through appropriate background processing, the flash read latency characteristics cannot be hidden and remains the biggest challenge, especially for the newest flash generations that store four bits per cell. In this paper, we introduce read heat separation (RHS), a new heat-aware data-placement technique that exploits the skew present in real-world workloads to place frequently read user data on low-latency flash pages. Although conceptually simple, such a technique is difficult to integrate in a flash controller, as it introduces a significant amount of complexity, requires more metadata, and is further constrained by other flash-specific peculiarities. To overcome these challenges, we propose a novel flash controller architecture supporting read heat-aware data placement. We first discuss the trade-offs that such a new design entails and analyze the key aspects that influence the efficiency of RHS. Through both, extensive simulations and an implementation we realized in a commercial enterprise-grade solid-state drive controller, we show that our architecture can indeed significantly reduce the average read latency. For certain workloads, it can reverse the system-level read latency trends when using recent multi-bit flash generations and hence outperform SSDs using previous faster flash generations.
{"title":"Improving NAND flash performance with read heat separation","authors":"R. Pletka, N. Papandreou, R. Stoica, H. Pozidis, Nikolas Ioannou, T. Fisher, Aaron Fry, Kip Ingram, Andrew Walls","doi":"10.1109/MASCOTS50786.2020.9285970","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285970","url":null,"abstract":"The continuous growth in 3D-NAND flash storage density has primarily been enabled by 3D stacking and by increasing the number of bits stored per memory cell. Unfortunately, these desirable flash device design choices are adversely affecting reliability and latency characteristics. In particular, increasing the number of bits stored per cell results in having to apply additional voltage thresholds during each read operation, therefore increasing the read latency characteristics. While most NAND flash challenges can be mitigated through appropriate background processing, the flash read latency characteristics cannot be hidden and remains the biggest challenge, especially for the newest flash generations that store four bits per cell. In this paper, we introduce read heat separation (RHS), a new heat-aware data-placement technique that exploits the skew present in real-world workloads to place frequently read user data on low-latency flash pages. Although conceptually simple, such a technique is difficult to integrate in a flash controller, as it introduces a significant amount of complexity, requires more metadata, and is further constrained by other flash-specific peculiarities. To overcome these challenges, we propose a novel flash controller architecture supporting read heat-aware data placement. We first discuss the trade-offs that such a new design entails and analyze the key aspects that influence the efficiency of RHS. Through both, extensive simulations and an implementation we realized in a commercial enterprise-grade solid-state drive controller, we show that our architecture can indeed significantly reduce the average read latency. For certain workloads, it can reverse the system-level read latency trends when using recent multi-bit flash generations and hence outperform SSDs using previous faster flash generations.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285960
Johannes Grohmann, Daniel Seybold, Simon Eismann, Mark Leznik, Samuel Kounev, Jörg Domaschka
Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of measurement configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12 %, while saving over 80 % of the measurement effort. We also publish all code and the acquired data set to foster future research.
{"title":"Baloo: Measuring and Modeling the Performance Configurations of Distributed DBMS","authors":"Johannes Grohmann, Daniel Seybold, Simon Eismann, Mark Leznik, Samuel Kounev, Jörg Domaschka","doi":"10.1109/MASCOTS50786.2020.9285960","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285960","url":null,"abstract":"Correctly configuring a distributed database management system (DBMS) deployed in a cloud environment for maximizing performance poses many challenges to operators. Even if the entire configuration spectrum could be measured directly, which is often infeasible due to the multitude of parameters, single measurements are subject to random variations and need to be repeated multiple times. In this work, we propose Baloo, a framework for systematically measuring and modeling different performance-relevant configurations of distributed DBMS in cloud environments. Baloo dynamically estimates the required number of measurement configurations, as well as the number of required measurement repetitions per configuration based on a desired target accuracy. We evaluate Baloo based on a data set consisting of 900 DBMS configuration measurements conducted in our private cloud setup. Our evaluation shows that the highly configurable framework is able to achieve a prediction error of up to 12 %, while saving over 80 % of the measurement effort. We also publish all code and the acquired data set to foster future research.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114396495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-17DOI: 10.1109/MASCOTS50786.2020.9285946
William Viktorsson, C. Klein, Johan Tordsson
The extreme adoption rate of container technologies along with raised security concerns have resulted in the development of multiple alternative container runtimes targeting security through additional layers of indirection. In an apples-to-apples comparison, we deploy three runtimes in the same Kubernetes cluster, the security focused Kata and gVisor, as well as the default Kubernetes runtime runC. Our evaluation based on three real applications demonstrate that runC outperforms the more secure alternatives up to 5x, that gVisor deploys containers up to 2x faster than Kata, but that Kata executes container up to 1.6x faster than gVisor. Our work illustrates that alternative, more secure, runtimes can be used in a plug-and-play manner in Kubernetes, but at a significant performance penalty. Our study is useful both to practitioners - to understand the current state of the technology in order to make the right decision in the selection, operation and/or design of platforms - and to scholars to illustrate how these technologies evolved over time.
{"title":"Security-Performance Trade-offs of Kubernetes Container Runtimes","authors":"William Viktorsson, C. Klein, Johan Tordsson","doi":"10.1109/MASCOTS50786.2020.9285946","DOIUrl":"https://doi.org/10.1109/MASCOTS50786.2020.9285946","url":null,"abstract":"The extreme adoption rate of container technologies along with raised security concerns have resulted in the development of multiple alternative container runtimes targeting security through additional layers of indirection. In an apples-to-apples comparison, we deploy three runtimes in the same Kubernetes cluster, the security focused Kata and gVisor, as well as the default Kubernetes runtime runC. Our evaluation based on three real applications demonstrate that runC outperforms the more secure alternatives up to 5x, that gVisor deploys containers up to 2x faster than Kata, but that Kata executes container up to 1.6x faster than gVisor. Our work illustrates that alternative, more secure, runtimes can be used in a plug-and-play manner in Kubernetes, but at a significant performance penalty. Our study is useful both to practitioners - to understand the current state of the technology in order to make the right decision in the selection, operation and/or design of platforms - and to scholars to illustrate how these technologies evolved over time.","PeriodicalId":272614,"journal":{"name":"2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115454801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}