International Conference on Virtual Execution Environments最新文献_第8页

Evaluation of delta compression techniques for efficient live migration of large virtual machines 大型虚拟机高效实时迁移的增量压缩技术评估

International Conference on Virtual Execution Environments

Pub Date : 2011-03-09 DOI: 10.1145/1952682.1952698

Petter Svärd, B. Hudzia, Johan Tordsson, E. Elmroth

Despite the widespread support for live migration of Virtual Machines (VMs) in current hypervisors, these have significant shortcomings when it comes to migration of certain types of VMs. More specifically, with existing algorithms, there is a high risk of service interruption when migrating VMs with high workloads and/or over low-bandwidth networks. In these cases, VM memory pages are dirtied faster than they can be transferred over the network, which leads to extended migration downtime. In this contribution, we study the application of delta compression during the transfer of memory pages in order to increase migration throughput and thus reduce downtime. The delta compression live migration algorithm is implemented as a modification to the KVM hypervisor. Its performance is evaluated by migrating VMs running different type of workloads and the evaluation demonstrates a significant decrease in migration downtime in all test cases. In a benchmark scenario the downtime is reduced by a factor of 100. In another scenario a streaming video server is live migrated with no perceivable downtime to the clients while the picture is frozen for eight seconds using standard approaches. Finally, in an enterprise application scenario, the delta compression algorithm successfully live migrates a very large system that fails after migration using the standard algorithm. Finally, we discuss some general effects of delta compression on live migration and analyze when it is beneficial to use this technique.

尽管当前管理程序广泛支持虚拟机(vm)的实时迁移，但是在迁移某些类型的vm时，这些管理程序存在明显的缺点。更具体地说，在现有的算法下，在高工作负载和/或低带宽网络上迁移虚拟机时，存在很高的业务中断风险。在这些情况下，虚拟机内存页被污染的速度比它们通过网络传输的速度要快，这会导致迁移停机时间延长。在本文中，我们研究了在内存页面传输过程中增量压缩的应用，以提高迁移吞吐量，从而减少停机时间。增量压缩实时迁移算法是对KVM管理程序的修改。通过迁移运行不同类型工作负载的虚拟机来评估其性能，评估表明在所有测试用例中迁移停机时间显著减少。在基准测试场景中，停机时间减少了100倍。在另一个场景中，流媒体视频服务器是实时迁移的，在使用标准方法冻结图片8秒的情况下，客户端没有明显的停机时间。最后，在企业应用程序场景中，增量压缩算法成功地实时迁移了一个非常大的系统，该系统在使用标准算法迁移后失败。最后，我们讨论了增量压缩对实时迁移的一般影响，并分析了何时使用该技术是有益的。

{"title":"Evaluation of delta compression techniques for efficient live migration of large virtual machines","authors":"Petter Svärd, B. Hudzia, Johan Tordsson, E. Elmroth","doi":"10.1145/1952682.1952698","DOIUrl":"https://doi.org/10.1145/1952682.1952698","url":null,"abstract":"Despite the widespread support for live migration of Virtual Machines (VMs) in current hypervisors, these have significant shortcomings when it comes to migration of certain types of VMs. More specifically, with existing algorithms, there is a high risk of service interruption when migrating VMs with high workloads and/or over low-bandwidth networks. In these cases, VM memory pages are dirtied faster than they can be transferred over the network, which leads to extended migration downtime. In this contribution, we study the application of delta compression during the transfer of memory pages in order to increase migration throughput and thus reduce downtime. The delta compression live migration algorithm is implemented as a modification to the KVM hypervisor. Its performance is evaluated by migrating VMs running different type of workloads and the evaluation demonstrates a significant decrease in migration downtime in all test cases. In a benchmark scenario the downtime is reduced by a factor of 100. In another scenario a streaming video server is live migrated with no perceivable downtime to the clients while the picture is frozen for eight seconds using standard approaches. Finally, in an enterprise application scenario, the delta compression algorithm successfully live migrates a very large system that fails after migration using the standard algorithm. Finally, we discuss some general effects of delta compression on live migration and analyze when it is beneficial to use this technique.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115911611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 183

Fine-grained user-space security through virtualization 通过虚拟化实现细粒度的用户空间安全性

International Conference on Virtual Execution Environments

Pub Date : 2011-03-09 DOI: 10.1145/1952682.1952703

Mathias Payer, T. Gross

This paper presents an approach to the safe execution of applications based on software-based fault isolation and policy-based system call authorization. A running application is encapsulated in an additional layer of protection using dynamic binary translation in user-space. This virtualization layer dynamically recompiles the machine code and adds multiple dynamic security guards that verify the running code to protect and contain the application. The binary translation system redirects all system calls to a policy-based system call authorization framework. This interposition framework validates every system call based on the given arguments and the location of the system call. Depending on the user-loadable policy and an extensible handler mechanism the framework decides whether a system call is allowed, rejected, or redirect to a specific user-space handler in the virtualization layer. This paper offers an in-depth analysis of the different security guarantees and a performance analysis of libdetox, a prototype of the full protection platform. The combination of software-based fault isolation and policy-based system call authorization imposes only low overhead and is therefore an attractive option to encapsulate and sandbox applications to improve host security.

本文提出了一种基于软件的故障隔离和基于策略的系统调用授权的应用程序安全执行方法。运行中的应用程序使用用户空间中的动态二进制转换封装在一个额外的保护层中。此虚拟化层动态地重新编译机器码，并添加多个动态安全防护来验证正在运行的代码，以保护和包含应用程序。二进制转换系统将所有系统调用重定向到基于策略的系统调用授权框架。这个介入框架根据给定的参数和系统调用的位置验证每个系统调用。根据用户可加载策略和可扩展处理程序机制，框架决定系统调用是允许、拒绝还是重定向到虚拟化层中的特定用户空间处理程序。本文对不同的安全保障进行了深入的分析，并对libdetox进行了性能分析，libdetox是一个完整保护平台的原型。基于软件的故障隔离和基于策略的系统调用授权的组合只会带来较低的开销，因此是封装和沙箱应用程序以提高主机安全性的一个有吸引力的选择。

{"title":"Fine-grained user-space security through virtualization","authors":"Mathias Payer, T. Gross","doi":"10.1145/1952682.1952703","DOIUrl":"https://doi.org/10.1145/1952682.1952703","url":null,"abstract":"This paper presents an approach to the safe execution of applications based on software-based fault isolation and policy-based system call authorization. A running application is encapsulated in an additional layer of protection using dynamic binary translation in user-space. This virtualization layer dynamically recompiles the machine code and adds multiple dynamic security guards that verify the running code to protect and contain the application.\u0000 The binary translation system redirects all system calls to a policy-based system call authorization framework. This interposition framework validates every system call based on the given arguments and the location of the system call. Depending on the user-loadable policy and an extensible handler mechanism the framework decides whether a system call is allowed, rejected, or redirect to a specific user-space handler in the virtualization layer.\u0000 This paper offers an in-depth analysis of the different security guarantees and a performance analysis of libdetox, a prototype of the full protection platform. The combination of software-based fault isolation and policy-based system call authorization imposes only low overhead and is therefore an attractive option to encapsulate and sandbox applications to improve host security.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Dynamic cache contention detection in multi-threaded applications 多线程应用程序中的动态缓存争用检测

International Conference on Virtual Execution Environments

Pub Date : 2011-03-09 DOI: 10.1145/1952682.1952688

Qin Zhao, David Koh, Syed Raza, Derek Bruening, W. Wong, Saman P. Amarasinghe

In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.

在当今的多核系统中，由于真假共享导致的缓存争用可能会导致意想不到的显著性能下降。需要详细了解给定多线程应用程序的行为，才能准确地识别此类性能瓶颈。然而，传统上，这种诊断信息只能在长时间模拟内存层次结构之后获得。在本文中，我们提出了一种新的方法，可以有效地分析线程之间的相互作用，以确定线程相关性并检测真假共享。它基于以下关键见解:尽管缓存争用导致的速度减慢取决于包括线程到核心绑定和内存层次结构参数在内的因素，但数据共享的数量主要是缓存行大小和应用程序行为的函数。使用内存阴影和动态检测，我们实现了一个工具，它可以在不模拟内存层次结构的全部复杂性的情况下获得线程之间的详细共享信息。我们的方法的运行时开销——相对于本机执行的平均速度降低5倍——明显低于详细的缓存模拟。收集到的信息使程序员能够确定应用程序中缓存争用的程度、线程之间的相关性以及重要错误共享的来源。使用我们的方法，我们能够将一些应用程序的性能提高12倍。对于其他竞争密集型应用程序，我们能够阐明阻碍其性能扩展到多个核心的障碍。

{"title":"Dynamic cache contention detection in multi-threaded applications","authors":"Qin Zhao, David Koh, Syed Raza, Derek Bruening, W. Wong, Saman P. Amarasinghe","doi":"10.1145/1952682.1952688","DOIUrl":"https://doi.org/10.1145/1952682.1952688","url":null,"abstract":"In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy.\u0000 In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Rethink the virtual machine template 重新考虑虚拟机模板

International Conference on Virtual Execution Environments

Pub Date : 2011-03-09 DOI: 10.1145/1952682.1952690

Kun Wang, J. Rao, Chengzhong Xu

Server virtualization technology facilitates the creation of an elastic computing infrastructure on demand. There are cloud applications like server-based computing and virtual desktop that concern startup latency and require impromptu requests for VM creation in a real-time manner. Conventional template-based VM creation is a time consuming process and lacks flexibility for the deployment of stateful VMs. In this paper, we present an abstraction of VM substrate to represent generic VM instances in miniature. Unlike templates that are stored as an image file in disk, VM substrates are docked in memory in a designated VM pool. They can be activated into stateful VMs without machine booting and application initialization. The abstraction leverages an arrange of techniques, including VM miniaturization, generalization, clone and migration, storage copy-on-write, and on-the-fly resource configuration, for rapid deployment of VMs and VM clusters on demand. We implement a prototype on a Xen platform and show that a server with typical configuration of TB disk and GB memory can accommodate more substrates in memory than templates in disk and stateful VMs can be created from the same or different substrates and deployed on to the same or different physical hosts in a cluster without causing any configuration conflicts. Experimental results show that general purpose VMs or a VM cluster for parallel computing can be deployed in a few seconds. We demonstrate the usage of VM substrates in a mobile gaming application.

服务器虚拟化技术有助于按需创建弹性计算基础设施。有一些云应用程序，比如基于服务器的计算和虚拟桌面，涉及到启动延迟，并且需要以实时的方式临时请求VM创建。传统的基于模板的方式创建虚拟机耗时长，且对状态虚拟机的部署缺乏灵活性。在本文中，我们提出了一个虚拟机基板的抽象来表示通用的虚拟机实例。与模板作为镜像文件存储在磁盘上不同，VM基板停靠在指定VM池中的内存中。它们可以被激活到有状态的vm中，而不需要机器引导和应用程序初始化。抽象利用了一系列技术，包括VM小型化、一般化、克隆和迁移、存储写时复制和动态资源配置，以便根据需要快速部署VM和VM集群。我们在Xen平台上实现了一个原型，并表明具有典型配置的TB磁盘和GB内存的服务器可以容纳比磁盘模板更多的内存基板，并且可以从相同或不同的基板创建有状态虚拟机，并部署到集群中相同或不同的物理主机上，而不会引起任何配置冲突。实验结果表明，通用虚拟机或并行计算虚拟机集群可以在几秒钟内部署完毕。我们演示了VM基板在移动游戏应用程序中的使用。

{"title":"Rethink the virtual machine template","authors":"Kun Wang, J. Rao, Chengzhong Xu","doi":"10.1145/1952682.1952690","DOIUrl":"https://doi.org/10.1145/1952682.1952690","url":null,"abstract":"Server virtualization technology facilitates the creation of an elastic computing infrastructure on demand. There are cloud applications like server-based computing and virtual desktop that concern startup latency and require impromptu requests for VM creation in a real-time manner. Conventional template-based VM creation is a time consuming process and lacks flexibility for the deployment of stateful VMs. In this paper, we present an abstraction of VM substrate to represent generic VM instances in miniature. Unlike templates that are stored as an image file in disk, VM substrates are docked in memory in a designated VM pool. They can be activated into stateful VMs without machine booting and application initialization. The abstraction leverages an arrange of techniques, including VM miniaturization, generalization, clone and migration, storage copy-on-write, and on-the-fly resource configuration, for rapid deployment of VMs and VM clusters on demand. We implement a prototype on a Xen platform and show that a server with typical configuration of TB disk and GB memory can accommodate more substrates in memory than templates in disk and stateful VMs can be created from the same or different substrates and deployed on to the same or different physical hosts in a cluster without causing any configuration conflicts. Experimental results show that general purpose VMs or a VM cluster for parallel computing can be deployed in a few seconds. We demonstrate the usage of VM substrates in a mobile gaming application.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Overdriver: handling memory overload in an oversubscribed cloud Overdriver:处理超额订阅云中的内存过载

International Conference on Virtual Execution Environments

Pub Date : 2011-03-09 DOI: 10.1145/1952682.1952709

Dan Williams, H. Jamjoom, Yew-Huey Liu, Hakim Weatherspoon

With the intense competition between cloud providers, oversubscription is increasingly important to maintain profitability. Oversubscribing physical resources is not without consequences: it increases the likelihood of overload. Memory overload is particularly damaging. Contrary to traditional views, we analyze current data center logs and realistic Web workloads to show that overload is largely transient: up to 88.1% of overloads last for less than 2 minutes. Regarding overload as a continuum that includes both transient and sustained overloads of various durations points us to consider mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead. In particular, heavyweight techniques, like VM migration, are better suited to sustained overloads, whereas lightweight approaches, like network memory, are better suited to transient overloads. We present Overdriver, a system that adaptively takes advantage of these tradeoffs, mitigating all overloads within 8% of well-provisioned performance. Furthermore, under reasonable oversubscription ratios, where transient overload constitutes the vast majority of overloads, Overdriver requires 15% of the excess space and generates a factor of four less network traffic than a migration-only approach.

随着云提供商之间的激烈竞争，超额订阅对于保持盈利能力变得越来越重要。过度订阅物理资源并非没有后果:它增加了过载的可能性。内存过载尤其有害。与传统观点相反，我们分析了当前的数据中心日志和实际的Web工作负载，结果表明，过载在很大程度上是短暂的:高达88.1%的过载持续时间不到2分钟。将过载视为一个连续体，其中包括各种持续时间的瞬态和持续过载，这使我们将缓解方法也视为一个连续体，并在应用程序性能和数据中心开销方面进行权衡。特别是，重量级技术(如VM迁移)更适合于持续的过载，而轻量级方法(如网络内存)更适合于瞬态过载。我们介绍了Overdriver，一个自适应地利用这些权衡的系统，将所有过载减轻到配置良好的性能的8%以内。此外，在合理的超额订阅比率下，瞬时过载构成了绝大多数过载，Overdriver需要15%的多余空间，并且生成的网络流量比仅迁移的方法少四倍。

{"title":"Overdriver: handling memory overload in an oversubscribed cloud","authors":"Dan Williams, H. Jamjoom, Yew-Huey Liu, Hakim Weatherspoon","doi":"10.1145/1952682.1952709","DOIUrl":"https://doi.org/10.1145/1952682.1952709","url":null,"abstract":"With the intense competition between cloud providers, oversubscription is increasingly important to maintain profitability. Oversubscribing physical resources is not without consequences: it increases the likelihood of overload. Memory overload is particularly damaging. Contrary to traditional views, we analyze current data center logs and realistic Web workloads to show that overload is largely transient: up to 88.1% of overloads last for less than 2 minutes. Regarding overload as a continuum that includes both transient and sustained overloads of various durations points us to consider mitigation approaches also as a continuum, complete with tradeoffs with respect to application performance and data center overhead. In particular, heavyweight techniques, like VM migration, are better suited to sustained overloads, whereas lightweight approaches, like network memory, are better suited to transient overloads. We present Overdriver, a system that adaptively takes advantage of these tradeoffs, mitigating all overloads within 8% of well-provisioned performance. Furthermore, under reasonable oversubscription ratios, where transient overload constitutes the vast majority of overloads, Overdriver requires 15% of the excess space and generates a factor of four less network traffic than a migration-only approach.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127744688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 86

DBT path selection for holistic memory efficiency and performance DBT路径选择的整体内存效率和性能

International Conference on Virtual Execution Environments

Pub Date : 2010-03-17 DOI: 10.1145/1735997.1736018

Apala Guha, K. Hazelwood, M. Soffa

Dynamic binary translators(DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must also maintain data structures to support the code cache. The high memory demands make it difficult for memory-constrained embedded systems to take advantage of DBT-based tools. Previous research on DBT memory management focused on the translated code and auxiliary code only. However, we found that data structures are comparable to the code cache in size. We show that the translated code size, auxiliary code size and the data structure size interact in a complex manner, depending on the path selection (trace selection and link formation) strategy. Therefore, holistic memory efficiency (comprising translated code, auxiliary code and data structures) cannot be improved by focusing on the code cache only. In this paper, we use path selection for improving holistic memory efficiency which in turn impacts performance in memory-constrained environments. Although there has been previous research on path selection, such research only considered performance in memory-unconstrained environments. The challenge for holistic memory efficiency is that the path selection strategy results in complex interactions between the memory demand components. Also, individual aspects of path selection and the holistic memory efficiency may impact performance in complex ways. We explore these interactions to motivate path selection targeting holistic memory demand. We enumerate all the aspects involved in a path selection design and evaluate a comprehensive set of approaches for each aspect. Finally, we propose a path selection strategy that reduces memory demands by 20% and at the same time improves performance by 5-20% compared to an industrial-strength DBT.

动态二进制翻译器(dbt)为构建动态程序监控和适配工具提供了强大的平台。然而，dbt具有很高的内存需求，因为它们将翻译后的代码和辅助代码缓存到软件代码缓存中，并且还必须维护数据结构以支持代码缓存。高内存需求使得内存受限的嵌入式系统难以利用基于dbt的工具。以往对DBT内存管理的研究主要集中在已翻译的代码和辅助代码上。然而，我们发现数据结构在大小上与代码缓存相当。我们表明，翻译代码大小、辅助代码大小和数据结构大小以复杂的方式相互作用，这取决于路径选择(跟踪选择和链接形成)策略。因此，整体内存效率(包括翻译代码、辅助代码和数据结构)不能通过只关注代码缓存来提高。在本文中，我们使用路径选择来提高整体内存效率，从而影响内存受限环境中的性能。虽然之前有关于路径选择的研究，但这些研究只考虑了内存无约束环境下的性能。整体内存效率面临的挑战是路径选择策略导致内存需求组件之间的复杂交互。此外，路径选择的各个方面和整体内存效率可能以复杂的方式影响性能。我们探索这些相互作用，以激励路径选择针对整体记忆需求。我们列举了路径选择设计中涉及的所有方面，并针对每个方面评估了一套全面的方法。最后，我们提出了一种路径选择策略，与工业强度DBT相比，该策略可以减少20%的内存需求，同时将性能提高5-20%。

{"title":"DBT path selection for holistic memory efficiency and performance","authors":"Apala Guha, K. Hazelwood, M. Soffa","doi":"10.1145/1735997.1736018","DOIUrl":"https://doi.org/10.1145/1735997.1736018","url":null,"abstract":"Dynamic binary translators(DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must also maintain data structures to support the code cache. The high memory demands make it difficult for memory-constrained embedded systems to take advantage of DBT-based tools. Previous research on DBT memory management focused on the translated code and auxiliary code only. However, we found that data structures are comparable to the code cache in size. We show that the translated code size, auxiliary code size and the data structure size interact in a complex manner, depending on the path selection (trace selection and link formation) strategy. Therefore, holistic memory efficiency (comprising translated code, auxiliary code and data structures) cannot be improved by focusing on the code cache only. In this paper, we use path selection for improving holistic memory efficiency which in turn impacts performance in memory-constrained environments. Although there has been previous research on path selection, such research only considered performance in memory-unconstrained environments.\u0000 The challenge for holistic memory efficiency is that the path selection strategy results in complex interactions between the memory demand components. Also, individual aspects of path selection and the holistic memory efficiency may impact performance in complex ways. We explore these interactions to motivate path selection targeting holistic memory demand. We enumerate all the aspects involved in a path selection design and evaluate a comprehensive set of approaches for each aspect. Finally, we propose a path selection strategy that reduces memory demands by 20% and at the same time improves performance by 5-20% compared to an industrial-strength DBT.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129739450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

AASH: an asymmetry-aware scheduler for hypervisors AASH:用于管理程序的不对称感知调度器

International Conference on Virtual Execution Environments

Pub Date : 2010-03-17 DOI: 10.1145/1735997.1736011

Vahid Kazempour, Ali Kamali, Alexandra Fedorova

Asymmetric multicore processors (AMP) consist of cores exposing the same instruction-set architecture (ISA) but varying in size, frequency, power consumption and performance. AMPs were shown to be more power efficient than conventional symmetric multicore processors, and it is therefore likely that future multicore systems will include cores of different types. AMPs derive their efficiency from core specialization: instruction streams can be assigned to run on the cores best suited to their demands for architectural resources. System efficiency is improved as a result. To perform effective matching of threads to cores, the thread scheduler must be asymmetry-aware; and while asymmetry-aware schedulers for operating systems are a well studied topic, asymmetry-awareness in hypervisors has not been addressed. A hypervisor must be asymmetry-aware to enable proper functioning of asymmetry-aware guest operating systems; otherwise they will be ineffective in virtual environments. Furthermore, a hypervisor must ensure that asymmetric cores are shared among multiple guests in a fair fashion or in accordance with their priorities. This work for the first time implements simple changes to the hypervisor scheduler, required to make it asymmetry-aware, and evaluates the benefits and overheads of these asymmetry-aware mechanisms. Our evaluation was performed using an open source hypervisor Xen on a real multicore system where asymmetry was emulated via CPU frequency scaling. We compared the asymmetry-aware hypervisor to default Xen. Our results indicate that asymmetry support can be implemented with low overheads, and resulting performance improvements can be significant, reaching up to 36% in our experiments. Most performance improvements are derived from the fact that an asymmetry-aware hypervisor ensures that the fast cores do not go idle before slow cores and from the fact that it maps virtual cores to physical cores for asymmetry-aware guests according to the guest's expectations. Other benefits from asymmetry awareness are fairer sharing of computing resources among VMs and more stable execution times.

非对称多核处理器(AMP)由暴露相同指令集架构(ISA)但在大小、频率、功耗和性能上不同的核心组成。amp被证明比传统的对称多核处理器更节能，因此未来的多核系统很可能包括不同类型的核心。amp的效率来源于核心专门化:指令流可以被分配到最适合其架构资源需求的核心上运行。因此，系统效率得到了提高。为了执行线程与内核的有效匹配，线程调度程序必须具有不对称意识;虽然操作系统的非对称感知调度器是一个研究得很好的主题，但管理程序中的非对称感知尚未得到解决。管理程序必须具有非对称意识，以使具有非对称意识的客户机操作系统能够正常工作;否则，它们将在虚拟环境中无效。此外，管理程序必须确保在多个客户机之间以公平的方式或按照它们的优先级共享非对称核心。这项工作首次实现了对管理程序调度器的简单更改，需要使其具有非对称感知，并评估了这些非对称感知机制的好处和开销。我们的评估是在一个真实的多核系统上使用开源管理程序Xen执行的，其中通过CPU频率缩放模拟了不对称。我们将不对称感知管理程序与默认Xen进行了比较。我们的结果表明，不对称支持可以以较低的开销实现，并且由此产生的性能改进可以显着提高，在我们的实验中达到36%。大多数性能改进来自以下事实:感知不对称的管理程序确保快速核心不会在慢核之前空闲，以及它根据客户的期望将感知不对称的客户的虚拟核心映射到物理核心。不对称感知的其他好处是在vm之间更公平地共享计算资源和更稳定的执行时间。

{"title":"AASH: an asymmetry-aware scheduler for hypervisors","authors":"Vahid Kazempour, Ali Kamali, Alexandra Fedorova","doi":"10.1145/1735997.1736011","DOIUrl":"https://doi.org/10.1145/1735997.1736011","url":null,"abstract":"Asymmetric multicore processors (AMP) consist of cores exposing the same instruction-set architecture (ISA) but varying in size, frequency, power consumption and performance. AMPs were shown to be more power efficient than conventional symmetric multicore processors, and it is therefore likely that future multicore systems will include cores of different types. AMPs derive their efficiency from core specialization: instruction streams can be assigned to run on the cores best suited to their demands for architectural resources. System efficiency is improved as a result. To perform effective matching of threads to cores, the thread scheduler must be asymmetry-aware; and while asymmetry-aware schedulers for operating systems are a well studied topic, asymmetry-awareness in hypervisors has not been addressed. A hypervisor must be asymmetry-aware to enable proper functioning of asymmetry-aware guest operating systems; otherwise they will be ineffective in virtual environments. Furthermore, a hypervisor must ensure that asymmetric cores are shared among multiple guests in a fair fashion or in accordance with their priorities.\u0000 This work for the first time implements simple changes to the hypervisor scheduler, required to make it asymmetry-aware, and evaluates the benefits and overheads of these asymmetry-aware mechanisms. Our evaluation was performed using an open source hypervisor Xen on a real multicore system where asymmetry was emulated via CPU frequency scaling. We compared the asymmetry-aware hypervisor to default Xen. Our results indicate that asymmetry support can be implemented with low overheads, and resulting performance improvements can be significant, reaching up to 36% in our experiments. Most performance improvements are derived from the fact that an asymmetry-aware hypervisor ensures that the fast cores do not go idle before slow cores and from the fact that it maps virtual cores to physical cores for asymmetry-aware guests according to the guest's expectations. Other benefits from asymmetry awareness are fairer sharing of computing resources among VMs and more stable execution times.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129317088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Novel online profiling for virtual machines 新颖的虚拟机在线分析

International Conference on Virtual Execution Environments

Pub Date : 2010-03-17 DOI: 10.1145/1735997.1736016

Manjiri A. Namjoshi, P. Kulkarni

Application profiling is a popular technique to improve program performance based on its behavior. Offline profiling, although beneficial for several applications, fails in cases where prior program runs may not be feasible, or if changes in input cause the profile to not match the behavior of the actual program run. Managed languages, like Java and C#, provide a unique opportunity to overcome the drawbacks of offline profiling by generating the profile information online during the current program run. Indeed, online profiling is extensively used in current VMs, especially during selective compilation to improve program startup performance, as well as during other feedback-directed optimizations. In this paper we illustrate the drawbacks of the current reactive mechanism of online profiling during selective compilation. Current VM profiling mechanisms are slow -- thereby delaying associated transformations, and estimate future behavior based on the program's immediate past -- leading to potential misspeculation that limit the benefits of compilation. We show that these drawbacks produce an average performance loss of over 14.5% on our set of benchmark programs, over an ideal offline approach that accurately compiles the hot methods early. We then propose and evaluate the potential of a novel strategy to achieve similar performance benefits with an online profiling approach. Our new online profiling strategy uses early determination of loop iteration bounds to predict future method hotness. We explore and present promising results on the potential, feasibility, and other issues involved for the successful implementation of this approach.

应用程序分析是一种流行的基于程序行为来改进程序性能的技术。脱机分析虽然对几个应用程序有益，但在先前的程序运行可能不可行的情况下失败，或者输入的更改导致概要文件与实际程序运行的行为不匹配。托管语言，如Java和c#，通过在当前程序运行期间在线生成概要信息，为克服离线概要分析的缺点提供了独特的机会。事实上，在线分析在当前的vm中被广泛使用，特别是在选择性编译期间，以提高程序启动性能，以及在其他反馈导向的优化期间。在本文中，我们说明了当前在线分析在选择性编译过程中的反应机制的缺点。当前的VM分析机制很慢——因此延迟了相关的转换，并且根据程序的直接过去来估计未来的行为——导致了潜在的错误推测，从而限制了编译的好处。我们表明，在我们的基准测试程序集上，这些缺点导致的平均性能损失超过14.5%，而理想的离线方法可以在早期准确编译hot方法。然后，我们提出并评估了一种新策略的潜力，该策略可以通过在线分析方法实现类似的性能优势。我们新的在线分析策略使用循环迭代边界的早期确定来预测未来的方法热度。我们对成功实施这一方法所涉及的潜力、可行性和其他问题进行了探讨，并提出了有希望的结果。

{"title":"Novel online profiling for virtual machines","authors":"Manjiri A. Namjoshi, P. Kulkarni","doi":"10.1145/1735997.1736016","DOIUrl":"https://doi.org/10.1145/1735997.1736016","url":null,"abstract":"Application profiling is a popular technique to improve program performance based on its behavior. Offline profiling, although beneficial for several applications, fails in cases where prior program runs may not be feasible, or if changes in input cause the profile to not match the behavior of the actual program run. Managed languages, like Java and C#, provide a unique opportunity to overcome the drawbacks of offline profiling by generating the profile information online during the current program run. Indeed, online profiling is extensively used in current VMs, especially during selective compilation to improve program startup performance, as well as during other feedback-directed optimizations.\u0000 In this paper we illustrate the drawbacks of the current reactive mechanism of online profiling during selective compilation. Current VM profiling mechanisms are slow -- thereby delaying associated transformations, and estimate future behavior based on the program's immediate past -- leading to potential misspeculation that limit the benefits of compilation. We show that these drawbacks produce an average performance loss of over 14.5% on our set of benchmark programs, over an ideal offline approach that accurately compiles the hot methods early. We then propose and evaluate the potential of a novel strategy to achieve similar performance benefits with an online profiling approach. Our new online profiling strategy uses early determination of loop iteration bounds to predict future method hotness. We explore and present promising results on the potential, feasibility, and other issues involved for the successful implementation of this approach.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128835637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Supporting soft real-time tasks in the xen hypervisor 支持xen管理程序中的软实时任务

International Conference on Virtual Execution Environments

Pub Date : 2010-03-17 DOI: 10.1145/1735997.1736012

Min Lee, A. Krishnakumar, P. Krishnan, Navjot Singh, S. Yajnik

Virtualization technology enables server consolidation and has given an impetus to low-cost green data centers. However, current hypervisors do not provide adequate support for real-time applications, and this has limited the adoption of virtualization in some domains. Soft real-time applications, such as media-based ones, are impeded by components of virtualization including low-performance virtualization I/O, increased scheduling latency, and shared-cache contention. The virtual machine scheduler is central to all these issues. The goal in this paper is to adapt the virtual machine scheduler to be more soft-real-time friendly. We improve two aspects of the VMM scheduler -- managing scheduling latency as a first-class resource and managing shared caches. We use enterprise IP telephony as an illustrative soft real-time workload and design a scheduler S that incorporates the knowledge of soft real-time applications in all aspects of the scheduler to support responsiveness. For this we first define a laxity value that can be interpreted as the target scheduling latency that the workload desires. The load balancer is also designed to minimize the latency for real-time tasks. For cache management, we take cache-affinity into account for real time tasks and load-balance accordingly to prevent cache thrashing. We measured cache misses and demonstrated that cache management is essential for soft real time tasks. Although our scheduler S employs a different design philosophy, interestingly enough it can be implemented with simple modifications to the Xen hypervisor's credit scheduler. Our experiments demonstrate that the Xen scheduler with our modifications can support soft real-time guests well, without penalizing non-real-time domains.

虚拟化技术支持服务器整合，并推动了低成本绿色数据中心的发展。然而，当前的管理程序并没有为实时应用程序提供足够的支持，这限制了虚拟化在某些领域的采用。软实时应用程序(如基于媒体的应用程序)受到虚拟化组件的阻碍，包括低性能的虚拟化I/O、增加的调度延迟和共享缓存争用。虚拟机调度器是所有这些问题的核心。本文的目标是使虚拟机调度程序更加软实时友好。我们改进了VMM调度器的两个方面——将调度延迟作为一级资源进行管理，并管理共享缓存。我们使用企业IP电话作为一个说明性的软实时工作负载，并设计了一个调度器S，它在调度器的各个方面都包含了软实时应用程序的知识，以支持响应。为此，我们首先定义一个松弛值，该值可以解释为工作负载所需的目标调度延迟。负载平衡器的设计还可以最大限度地减少实时任务的延迟。对于缓存管理，我们考虑了实时任务的缓存亲和性，并相应地进行了负载平衡，以防止缓存抖动。我们测量了缓存缺失，并证明了缓存管理对于软实时任务是必不可少的。尽管我们的调度器S采用了不同的设计理念，但有趣的是，它可以通过对Xen管理程序的信用调度器进行简单修改来实现。我们的实验表明，经过修改的Xen调度器可以很好地支持软实时客户机，而不会影响非实时域。

{"title":"Supporting soft real-time tasks in the xen hypervisor","authors":"Min Lee, A. Krishnakumar, P. Krishnan, Navjot Singh, S. Yajnik","doi":"10.1145/1735997.1736012","DOIUrl":"https://doi.org/10.1145/1735997.1736012","url":null,"abstract":"Virtualization technology enables server consolidation and has given an impetus to low-cost green data centers. However, current hypervisors do not provide adequate support for real-time applications, and this has limited the adoption of virtualization in some domains. Soft real-time applications, such as media-based ones, are impeded by components of virtualization including low-performance virtualization I/O, increased scheduling latency, and shared-cache contention. The virtual machine scheduler is central to all these issues. The goal in this paper is to adapt the virtual machine scheduler to be more soft-real-time friendly.\u0000 We improve two aspects of the VMM scheduler -- managing scheduling latency as a first-class resource and managing shared caches. We use enterprise IP telephony as an illustrative soft real-time workload and design a scheduler S that incorporates the knowledge of soft real-time applications in all aspects of the scheduler to support responsiveness. For this we first define a laxity value that can be interpreted as the target scheduling latency that the workload desires. The load balancer is also designed to minimize the latency for real-time tasks. For cache management, we take cache-affinity into account for real time tasks and load-balance accordingly to prevent cache thrashing. We measured cache misses and demonstrated that cache management is essential for soft real time tasks. Although our scheduler S employs a different design philosophy, interestingly enough it can be implemented with simple modifications to the Xen hypervisor's credit scheduler. Our experiments demonstrate that the Xen scheduler with our modifications can support soft real-time guests well, without penalizing non-real-time domains.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130475249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 147

Optimizing crash dump in virtualized environments 优化虚拟环境中的崩溃转储

International Conference on Virtual Execution Environments

Pub Date : 2010-03-17 DOI: 10.1145/1735997.1736003

Yijian Huang, Haibo Chen, B. Zang

Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems. In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM. We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.

崩溃转储或核心转储是在系统崩溃时保存内存映像以供将来脱机调试和分析的典型方法。但是，对于可能具有丰富内存的典型服务器机器，核心转储的时间会延迟基于重新启动的恢复，从而显著增加平均修复时间(MTTR)，而不转储故障上下文进行分析可能会导致同一问题上的重复崩溃。在本文中，为了缩短合并虚拟机在崩溃期间的MTTR，我们提出了几种虚拟化环境中核心转储的优化技术。首先，我们通过在崩溃的VM和新生成的VM之间动态回收和分配内存，并行处理崩溃转储和重新启动崩溃VM的过程。其次，我们使用虚拟机管理层来内省崩溃虚拟机的关键数据结构，以过滤掉未使用的内存转储。最后，我们根据用户调整的速率控制策略，在核心转储和新生成的虚拟机之间实现磁盘I/O速率控制，以平衡崩溃转储的时间和恢复虚拟机的服务质量。我们已经实现了一个工作原型，Vicover，它在Xen中优化了虚拟机系统崩溃时的核心转储，以最小化核心转储和恢复的整体MTTR。在我们对虚拟TPC-W服务器的实验中，Vicover将崩溃转储导致的停机时间缩短了大约5倍。

{"title":"Optimizing crash dump in virtualized environments","authors":"Yijian Huang, Haibo Chen, B. Zang","doi":"10.1145/1735997.1736003","DOIUrl":"https://doi.org/10.1145/1735997.1736003","url":null,"abstract":"Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems.\u0000 In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM.\u0000 We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125793364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1