Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2) straggler tasks can increase end-to-end latency by 8 times more than the median task on a production cluster [1]. We propose a feedback-control mechanism that allows frameworks to adaptively load-balance workloads across tasks according to their processing speeds. The task runtimes are thus equalized, lowering end-to-end tail latency. Further, this reduces load on machines that have transient resource bottlenecks, thus resolving the bottlenecks and preventing them from having an enduring impact on task runtimes.
{"title":"Reducing tail latencies in micro-batch streaming workloads","authors":"Faria Kalim, A. Tantawi, S. Costache, A. Youssef","doi":"10.1145/3127479.3134433","DOIUrl":"https://doi.org/10.1145/3127479.3134433","url":null,"abstract":"Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2) straggler tasks can increase end-to-end latency by 8 times more than the median task on a production cluster [1]. We propose a feedback-control mechanism that allows frameworks to adaptively load-balance workloads across tasks according to their processing speeds. The task runtimes are thus equalized, lowering end-to-end tail latency. Further, this reduces load on machines that have transient resource bottlenecks, thus resolving the bottlenecks and preventing them from having an enduring impact on task runtimes.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74164057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kernel samepage merging (KSM) in Linux kernel archive is a memory deduplication scheme that finds duplicate pages and shares the page in order to alleviate memory bottleneck in cloud. However, because the KSM has to scan all pages in memory to find duplicate pages, KSM consumes high CPU cycles and so causes virtual machines (VMs) performance degradation [1]. This degradation of VMs performance is an obstacle in cloud to service real-time applications (i.e. Netflix) [3]. A previous work, CMD [1] proposed page grouping scheme to reduce page comparisons, but it requires special monitoring hardware, XLH [2] enhanced page sharing with the information of guest VM I/O operation. However, the CPU overhead of XLH is still very high - similar to the default KSM. to make KSM more useful, we need an optimization scheme that consume less CPU cycles. Therefore, we first profile the CPU cycle consumption of KSM and the results show that page comparison (28.77%) and page checksum (26.14%) take most of cycles. Based on the results, we propose advanced KSM for cloud computing (AKC) that consumes less CPU cycles than the default KSM. to reduce the number of page comparisons, we apply checksum based RB-tree structure. In addition, AKC decreases page checksum overhead with hardware-accelerated crc32 hash function.
Linux内核存档中的内核同页合并(Kernel samepage merge, KSM)是一种查找重复页面并共享页面的内存重复数据删除方案,以缓解云环境中的内存瓶颈。然而,由于KSM必须扫描内存中的所有页面才能找到重复的页面,因此KSM消耗很高的CPU周期,从而导致虚拟机(vm)性能下降[1]。这种虚拟机性能的下降是云服务实时应用程序(即Netflix)的障碍[3]。先前的工作CMD[1]提出了页面分组方案来减少页面比较,但它需要特殊的监控硬件,XLH[2]增强了与guest VM I/O操作信息的页面共享。然而,XLH的CPU开销仍然非常高——与默认的KSM类似。为了使KSM更有用,我们需要一个消耗更少CPU周期的优化方案。因此,我们首先分析了KSM的CPU周期消耗,结果表明页面比较(28.77%)和页面校验和(26.14%)占用了大部分周期。基于结果,我们提出了用于云计算(AKC)的高级KSM,它比默认KSM消耗更少的CPU周期。为了减少页面比较的次数,我们采用了基于校验和的rb树结构。此外,AKC通过硬件加速的crc32哈希函数减少了页面校验和开销。
{"title":"AKC: advanced KSM for cloud computing","authors":"Sioh Lee, Bongkyu Kim, Youngpil Kim, C. Yoo","doi":"10.1145/3127479.3131616","DOIUrl":"https://doi.org/10.1145/3127479.3131616","url":null,"abstract":"Kernel samepage merging (KSM) in Linux kernel archive is a memory deduplication scheme that finds duplicate pages and shares the page in order to alleviate memory bottleneck in cloud. However, because the KSM has to scan all pages in memory to find duplicate pages, KSM consumes high CPU cycles and so causes virtual machines (VMs) performance degradation [1]. This degradation of VMs performance is an obstacle in cloud to service real-time applications (i.e. Netflix) [3]. A previous work, CMD [1] proposed page grouping scheme to reduce page comparisons, but it requires special monitoring hardware, XLH [2] enhanced page sharing with the information of guest VM I/O operation. However, the CPU overhead of XLH is still very high - similar to the default KSM. to make KSM more useful, we need an optimization scheme that consume less CPU cycles. Therefore, we first profile the CPU cycle consumption of KSM and the results show that page comparison (28.77%) and page checksum (26.14%) take most of cycles. Based on the results, we propose advanced KSM for cloud computing (AKC) that consumes less CPU cycles than the default KSM. to reduce the number of page comparisons, we apply checksum based RB-tree structure. In addition, AKC decreases page checksum overhead with hardware-accelerated crc32 hash function.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75893415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of Web 2.0 companies, such as Facebook, Google, and Amazon with their insatiable appetite for vast amounts of structured, semi-structured, and unstructured data, triggered the development of Hadoop and related tools, e.g., YARN, MapReduce, and Pig, as well as NoSQL databases. These tools form an open source software stack to support the processing of large and diverse data sets on clustered systems to perform decision support tasks. Recently, SQL is resurrecting in many of these solutions, e.g., Hive, Stinger, Impala, Shark, and Presto. At the same time, RDBMS vendors are adding Hadoop support into their SQL engines, e.g., IBM's Big SQL, Actian's Vortex, Oracle's Big Data SQL, and SAP's HANA. Because there was no industry standard benchmark that could measure the performance of SQL-based big data solutions, marketing claims were mostly based on "cherry picked" subsets of the TPC-DS benchmark to suit individual companies strengths, while blending out their weaknesses. In this paper, we present and analyze our work on modifying TPC-DS to fill the void for an industry standard benchmark that is able to measure the performance of SQL-based big data solutions. The new benchmark was ratified by the TPC in early 2016. To show the significance of the new benchmark, we analyze performance data obtained on four different systems running big data, traditional RDBMS, and columnar in-memory architectures.
Web 2.0公司的出现,如Facebook、谷歌和Amazon,他们对大量结构化、半结构化和非结构化数据的贪欲无法满足,引发了Hadoop和相关工具的发展,如YARN、MapReduce和Pig,以及NoSQL数据库。这些工具形成了一个开源软件堆栈,以支持处理集群系统上的大型和不同的数据集,从而执行决策支持任务。最近,SQL在许多这些解决方案中复活,例如Hive、Stinger、Impala、Shark和Presto。与此同时,RDBMS供应商正在将Hadoop支持添加到他们的SQL引擎中,例如IBM的Big SQL、Actian的Vortex、Oracle的Big Data SQL和SAP的HANA。由于没有行业标准基准可以衡量基于sql的大数据解决方案的性能,营销主张主要是基于“精心挑选”的TPC-DS基准子集,以适应个别公司的优势,同时融合他们的弱点。在本文中,我们介绍并分析了我们在修改TPC-DS方面的工作,以填补能够衡量基于sql的大数据解决方案性能的行业标准基准的空白。2016年初,TPC批准了新的基准。为了展示新基准的重要性,我们分析了在运行大数据、传统RDBMS和列式内存架构的四个不同系统上获得的性能数据。
{"title":"Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems","authors":"Meikel Pöss, T. Rabl, H. Jacobsen","doi":"10.1145/3127479.3128603","DOIUrl":"https://doi.org/10.1145/3127479.3128603","url":null,"abstract":"The advent of Web 2.0 companies, such as Facebook, Google, and Amazon with their insatiable appetite for vast amounts of structured, semi-structured, and unstructured data, triggered the development of Hadoop and related tools, e.g., YARN, MapReduce, and Pig, as well as NoSQL databases. These tools form an open source software stack to support the processing of large and diverse data sets on clustered systems to perform decision support tasks. Recently, SQL is resurrecting in many of these solutions, e.g., Hive, Stinger, Impala, Shark, and Presto. At the same time, RDBMS vendors are adding Hadoop support into their SQL engines, e.g., IBM's Big SQL, Actian's Vortex, Oracle's Big Data SQL, and SAP's HANA. Because there was no industry standard benchmark that could measure the performance of SQL-based big data solutions, marketing claims were mostly based on \"cherry picked\" subsets of the TPC-DS benchmark to suit individual companies strengths, while blending out their weaknesses. In this paper, we present and analyze our work on modifying TPC-DS to fill the void for an industry standard benchmark that is able to measure the performance of SQL-based big data solutions. The new benchmark was ratified by the TPC in early 2016. To show the significance of the new benchmark, we analyze performance data obtained on four different systems running big data, traditional RDBMS, and columnar in-memory architectures.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"104 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76669164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protecting the customer's SSL private key is the paramount issue to persuade the website owners to migrate their contents onto the cloud infrastructure, besides the advantages of cloud infrastructure in terms of flexibility, efficiency, scalability and elasticity. The emerging Keyless SSL solution retains on-premise custody of customers' SSL private keys on their own servers. However, it suffers from significant performance degradation and limited scalability, caused by the long distance connection to Key Server for each new coming end-user request. The performance improvements using persistent session and key caching onto cloud will degrade the key invulnerability and discourage the website owners because of the cloud's security bugs. In this paper, the challenges of secured key protection and distribution are addressed in philosophy of "Storing the trusted DATA on untrusted platform and transmitting through untrusted channel". To this end, a three-phase hierarchical key management scheme, called STYX1 is proposed to provide the secured key protection together with hardware assisted service acceleration for cloud-based content delivery network (CCDN) applications. The STYX is implemented based on Intel Software Guard Extensions (SGX), Intel QuickAssist Technology (QAT) and SIGMA (SIGn-and-MAc) protocol. STYX can provide the tight key security guarantee by SGX based key distribution with a light overhead, and it can further significantly enhance the system performance with QAT based acceleration. The comprehensive evaluations show that the STYX not only guarantees the absolute security but also outperforms the direct HTTPS server deployed CDN without QAT by up to 5x throughput with significant latency reduction at the same time.
{"title":"STYX: a trusted and accelerated hierarchical SSL key management and distribution system for cloud based CDN application","authors":"Changzheng Wei, Jian Li, Weigang Li, Ping Yu, Haibing Guan","doi":"10.1145/3127479.3127482","DOIUrl":"https://doi.org/10.1145/3127479.3127482","url":null,"abstract":"Protecting the customer's SSL private key is the paramount issue to persuade the website owners to migrate their contents onto the cloud infrastructure, besides the advantages of cloud infrastructure in terms of flexibility, efficiency, scalability and elasticity. The emerging Keyless SSL solution retains on-premise custody of customers' SSL private keys on their own servers. However, it suffers from significant performance degradation and limited scalability, caused by the long distance connection to Key Server for each new coming end-user request. The performance improvements using persistent session and key caching onto cloud will degrade the key invulnerability and discourage the website owners because of the cloud's security bugs. In this paper, the challenges of secured key protection and distribution are addressed in philosophy of \"Storing the trusted DATA on untrusted platform and transmitting through untrusted channel\". To this end, a three-phase hierarchical key management scheme, called STYX1 is proposed to provide the secured key protection together with hardware assisted service acceleration for cloud-based content delivery network (CCDN) applications. The STYX is implemented based on Intel Software Guard Extensions (SGX), Intel QuickAssist Technology (QAT) and SIGMA (SIGn-and-MAc) protocol. STYX can provide the tight key security guarantee by SGX based key distribution with a light overhead, and it can further significantly enhance the system performance with QAT based acceleration. The comprehensive evaluations show that the STYX not only guarantees the absolute security but also outperforms the direct HTTPS server deployed CDN without QAT by up to 5x throughput with significant latency reduction at the same time.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87431214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Baek, Cheng Jin, Guofei Jiang, C. Lumezanu, J. Merwe, Ning Xia, Qiang Xu
Network usage accountability is critical in helping operators and customers of multi-tenant data centers deal with concerns such as capacity planning, resource allocation, hotspot detection, link failure detection, and troubleshooting. However, the cost of measurements and instrumentation to achieve flow-level accountability is non-trivial. We propose Polygravity to determine tenant traffic usage via lightweight measurements in multi-tenant data centers. We adopt a tomogravity model widely used in ISP networks, and adapt it to a multi-tenant data center environment. By integrating datacenter-specific domain knowledge, sampling-based partial estimation and gravity-based internal sinks/sources estimation, Polygravity addresses two key challenges for adapting tomogravity to a data center environment: sparse traffic matrices and internal traffic sinks/sources. We conducted extensive evaluation of our approach using realistic data center workloads. Our results show that Polygravity can determine tenant IP flow usage with less than 1% average relative error for tenants with fine-grained domain knowledge. In addition, for tenants with coarse-grained domain knowledge and with partial host-based sampling, Polygravity reduces the relative error of sampling-based estimation by 1/3.
{"title":"Polygravity: traffic usage accountability via coarse-grained measurements in multi-tenant data centers","authors":"H. Baek, Cheng Jin, Guofei Jiang, C. Lumezanu, J. Merwe, Ning Xia, Qiang Xu","doi":"10.1145/3127479.3129258","DOIUrl":"https://doi.org/10.1145/3127479.3129258","url":null,"abstract":"Network usage accountability is critical in helping operators and customers of multi-tenant data centers deal with concerns such as capacity planning, resource allocation, hotspot detection, link failure detection, and troubleshooting. However, the cost of measurements and instrumentation to achieve flow-level accountability is non-trivial. We propose Polygravity to determine tenant traffic usage via lightweight measurements in multi-tenant data centers. We adopt a tomogravity model widely used in ISP networks, and adapt it to a multi-tenant data center environment. By integrating datacenter-specific domain knowledge, sampling-based partial estimation and gravity-based internal sinks/sources estimation, Polygravity addresses two key challenges for adapting tomogravity to a data center environment: sparse traffic matrices and internal traffic sinks/sources. We conducted extensive evaluation of our approach using realistic data center workloads. Our results show that Polygravity can determine tenant IP flow usage with less than 1% average relative error for tenants with fine-grained domain knowledge. In addition, for tenants with coarse-grained domain knowledge and with partial host-based sampling, Polygravity reduces the relative error of sampling-based estimation by 1/3.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91199938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dino Lopez Pacheco, Quentin Jacquemart, A. Segalini, M. Rifai, M. Dione, G. Urvoy-Keller
Idle virtual machines (VMs) are a waste of resources in data centers. We introduce SEaMLESS, which transforms a fully-Hedged idle VM into a lightweight and resourceless Virtual Network Function (VNF). Idle VMs can then be saved to disk and release their memory. Simultaneously, the VNF provides service availability. Upon user activity, the appropriate VM is restored, without introducing any interruption for service users. Tens of VNFs can be contained within the same memory space required for one single VM, thereby facilitating ample resources savings when scaled up to a data center.
{"title":"SEaMLESS: a SErvice migration cLoud architecture for energy saving and memory releaSing capabilities","authors":"Dino Lopez Pacheco, Quentin Jacquemart, A. Segalini, M. Rifai, M. Dione, G. Urvoy-Keller","doi":"10.1145/3127479.3128604","DOIUrl":"https://doi.org/10.1145/3127479.3128604","url":null,"abstract":"Idle virtual machines (VMs) are a waste of resources in data centers. We introduce SEaMLESS, which transforms a fully-Hedged idle VM into a lightweight and resourceless Virtual Network Function (VNF). Idle VMs can then be saved to disk and release their memory. Simultaneously, the VNF provides service availability. Upon user activity, the appropriate VM is restored, without introducing any interruption for service users. Tens of VNFs can be contained within the same memory space required for one single VM, thereby facilitating ample resources savings when scaled up to a data center.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"41 12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89249351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Burtsev, David Johnson, Josh Kunz, E. Eide, J. Merwe
We present CapNet, a capability-based network architecture designed to enable least authority and secure collaboration in the cloud. CapNet allows fine-grained management of rights, recursive delegation, hierarchical policies, and least privilege. To enable secure collaboration, CapNet extends a classical capability model with support for decentralized authority. We implement CapNet in the substrate of a software-defined network, integrate it with the OpenStack cloud, and develop protocols enabling secure multi-party collaboration.
{"title":"CapNet: security and least authority in a capability-enabled cloud","authors":"A. Burtsev, David Johnson, Josh Kunz, E. Eide, J. Merwe","doi":"10.1145/3127479.3131209","DOIUrl":"https://doi.org/10.1145/3127479.3131209","url":null,"abstract":"We present CapNet, a capability-based network architecture designed to enable least authority and secure collaboration in the cloud. CapNet allows fine-grained management of rights, recursive delegation, hierarchical policies, and least privilege. To enable secure collaboration, CapNet extends a classical capability model with support for decentralized authority. We implement CapNet in the substrate of a software-defined network, integrate it with the OpenStack cloud, and develop protocols enabling secure multi-party collaboration.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78304622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edge computing caters to a wide range of use cases from latency sensitive to bandwidth constrained applications. However, the exact specifications of the edge that give the most benefit for each type of application are still unclear. We investigate the concrete conditions when the edge is feasible, i.e., when users observe performance gains from the edge while costs remain low for the providers, for an application that requires both low latency and high bandwidth: video analytics.
{"title":"To edge or not to edge?","authors":"Faria Kalim, S. Noghabi, Shiv Verma","doi":"10.1145/3127479.3132572","DOIUrl":"https://doi.org/10.1145/3127479.3132572","url":null,"abstract":"Edge computing caters to a wide range of use cases from latency sensitive to bandwidth constrained applications. However, the exact specifications of the edge that give the most benefit for each type of application are still unclear. We investigate the concrete conditions when the edge is feasible, i.e., when users observe performance gains from the edge while costs remain low for the providers, for an application that requires both low latency and high bandwidth: video analytics.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73199845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jae-Hwa Im, Jongyul Kim, Jonguk Kim, Seongwook Jin, S. Maeng
The level of demand for bare-metal cloud services has increased rapidly because such services are cost-effective for several types of workloads, and some cloud clients prefer a single-tenant environment due to the lower security vulnerability of such enviornments. However, as the bare-metal cloud does not utilize a virtualization layer, it cannot use live migration. Thus, there is a lack of manageability with the bare-metal cloud. Live migration support can improve the manageability of bare-metal cloud services significantly. This paper suggests an on-demand virtualization technique to improve the manageability of bare-metal cloud services. A thin virtualization layer is inserted into the bare-metal cloud when live migration is requested. After the completion of the live migration process, the thin virtualization layer is removed from the host. We modified BitVisor [19] to implement on-demand virtualization and live migration on the x86 architecture. The elapsed time of on-demand virtualization was negligible. It takes about 20 ms to insert the virtualization layer and 30 ms to remove the one. After removing the virtualization layer, the host machine works with bare-metal performance.
{"title":"On-demand virtualization for live migration in bare metal cloud","authors":"Jae-Hwa Im, Jongyul Kim, Jonguk Kim, Seongwook Jin, S. Maeng","doi":"10.1145/3127479.3129254","DOIUrl":"https://doi.org/10.1145/3127479.3129254","url":null,"abstract":"The level of demand for bare-metal cloud services has increased rapidly because such services are cost-effective for several types of workloads, and some cloud clients prefer a single-tenant environment due to the lower security vulnerability of such enviornments. However, as the bare-metal cloud does not utilize a virtualization layer, it cannot use live migration. Thus, there is a lack of manageability with the bare-metal cloud. Live migration support can improve the manageability of bare-metal cloud services significantly. This paper suggests an on-demand virtualization technique to improve the manageability of bare-metal cloud services. A thin virtualization layer is inserted into the bare-metal cloud when live migration is requested. After the completion of the live migration process, the thin virtualization layer is removed from the host. We modified BitVisor [19] to implement on-demand virtualization and live migration on the x86 architecture. The elapsed time of on-demand virtualization was negligible. It takes about 20 ms to insert the virtualization layer and 30 ms to remove the one. After removing the virtualization layer, the host machine works with bare-metal performance.","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72852558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Vernik, M. Factor, E. K. Kolodner, Effi Ofer, P. Michiardi, Francesco Pace
Data is the natural resource of the 21st century. It is being produced at dizzying rates, e.g., for genomics, for media and entertainment, and for Internet of Things. Object storage systems such as Amazon S3, Azure Blob storage, and IBM Cloud Object Storage, are highly scalable distributed storage systems that offer high capacity, cost effective storage. But it is not enough just to store data; we also need to derive value from it. Apache Spark is the leading big data analytics processing engine combining MapReduce, SQL, streaming, and complex analytics. We present Stocator, a high performance storage connector, enabling Spark to work directly on data stored in object storage systems, while providing the same correctness guarantees as Hadoop's original storage system, HDFS. Current object storage connectors from the Hadoop community, e.g., for the S3 and Swift APIs, do not deal well with eventual consistency, which can lead to failure. These connectors assume file system semantics, which is natural given that their model of operation is based on interaction with HDFS. In particular, Spark and Hadoop achieve fault tolerance and enable speculative execution by creating temporary files, listing directories to identify these files, and then renaming them. This paradigm avoids interference between tasks doing the same work and thus writing output with the same name. However, with eventually consistent object storage, a container listing may not yet include a recently created object, and thus an object may not be renamed, leading to incomplete or incorrect results. Solutions such as EMRFS [1] from Amazon, S3mper [4] from Netflix, and S3Guard [2], attempt to overcome eventual consistency by requiring additional strongly consistent data storage. These solutions require multiple storage systems, are costly, and can introduce issues of consistency between the stores. Current object storage connectors from the Hadoop community are also notorious for their poor performance for write workloads. This, too, stems from their use of the rename operation, which is not a native object storage operation; not only is it not atomic, but it must be implemented using a costly copy operation, followed by delete. Others have tried to improve the performance of object storage connectors by eliminating rename, e.g., the Direct-ParquetOutputCommitter [5] for S3a introduced by Databricks, but have failed to preserve fault tolerance and speculation. Stocator takes advantage of object storage semantics to achieve both high performance and fault tolerance. It eliminates the rename paradigm by writing each output object to its final name. The name includes both the part number and the attempt number, so that multiple attempts to write the same part use different objects. Stocator proposes to extend an already existing success indicator object written at the end of a Spark job, to include a manifest with the names of all the objects that compose the final output; this ensures that
{"title":"Stocator: an object store aware connector for apache spark","authors":"G. Vernik, M. Factor, E. K. Kolodner, Effi Ofer, P. Michiardi, Francesco Pace","doi":"10.1145/3127479.3134761","DOIUrl":"https://doi.org/10.1145/3127479.3134761","url":null,"abstract":"Data is the natural resource of the 21st century. It is being produced at dizzying rates, e.g., for genomics, for media and entertainment, and for Internet of Things. Object storage systems such as Amazon S3, Azure Blob storage, and IBM Cloud Object Storage, are highly scalable distributed storage systems that offer high capacity, cost effective storage. But it is not enough just to store data; we also need to derive value from it. Apache Spark is the leading big data analytics processing engine combining MapReduce, SQL, streaming, and complex analytics. We present Stocator, a high performance storage connector, enabling Spark to work directly on data stored in object storage systems, while providing the same correctness guarantees as Hadoop's original storage system, HDFS. Current object storage connectors from the Hadoop community, e.g., for the S3 and Swift APIs, do not deal well with eventual consistency, which can lead to failure. These connectors assume file system semantics, which is natural given that their model of operation is based on interaction with HDFS. In particular, Spark and Hadoop achieve fault tolerance and enable speculative execution by creating temporary files, listing directories to identify these files, and then renaming them. This paradigm avoids interference between tasks doing the same work and thus writing output with the same name. However, with eventually consistent object storage, a container listing may not yet include a recently created object, and thus an object may not be renamed, leading to incomplete or incorrect results. Solutions such as EMRFS [1] from Amazon, S3mper [4] from Netflix, and S3Guard [2], attempt to overcome eventual consistency by requiring additional strongly consistent data storage. These solutions require multiple storage systems, are costly, and can introduce issues of consistency between the stores. Current object storage connectors from the Hadoop community are also notorious for their poor performance for write workloads. This, too, stems from their use of the rename operation, which is not a native object storage operation; not only is it not atomic, but it must be implemented using a costly copy operation, followed by delete. Others have tried to improve the performance of object storage connectors by eliminating rename, e.g., the Direct-ParquetOutputCommitter [5] for S3a introduced by Databricks, but have failed to preserve fault tolerance and speculation. Stocator takes advantage of object storage semantics to achieve both high performance and fault tolerance. It eliminates the rename paradigm by writing each output object to its final name. The name includes both the part number and the attempt number, so that multiple attempts to write the same part use different objects. Stocator proposes to extend an already existing success indicator object written at the end of a Spark job, to include a manifest with the names of all the objects that compose the final output; this ensures that","PeriodicalId":20679,"journal":{"name":"Proceedings of the 2017 Symposium on Cloud Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83221819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}