International Conference on Virtual Execution Environments最新文献_第5页

Swift: a register-based JIT compiler for embedded JVMs Swift:用于嵌入式jvm的基于寄存器的JIT编译器

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151035

Yuan Zhang, Min Yang, Bo Zhou, Zhemin Yang, Weihua Zhang, B. Zang

Code quality and compilation speed are two challenges to JIT compilers, while selective compilation is commonly used to trade-off these two issues. Meanwhile, with more and more Java applications running in mobile devices, selective compilation meets many problems. Since these applications always have flat execution profile and short live time, a lightweight JIT technique without losing code quality is extremely needed. However, the overhead of compiling stack-based Java bytecode to heterogeneous register-based machine code is significant in embedded devices. This paper presents a fast and effective JIT technique for mobile devices, building on a register-based Java bytecode format which is more similar to the underlying machine architecture. Through a comprehensive study on the characteristics of Java applications, we observe that virtual registers used by more than 90% Java methods can be directly fulfilled by 11 physical registers. Based on this observation, this paper proposes Swift, a novel JIT compiler on register-based bytecode, which generates native code for RISC machines. After mapping virtual registers to physical registers, the code is generated efficiently by looking up a translation table. And the code quality is guaranteed by the static compiler which is used to generate register-based bytecode. Besides, we design two lightweight optimizations and an efficient code unloader to make Swift more suitable for embedded environment. As the prevalence of Android, a prototype of Swift is implemented upon DEX bytecode which is the official distribution format of Android applications. Swift is evaluated with three benchmarks (SPECjvm98, EmbeddedCaffeineMark3 and JemBench2) on two different ARM SOCs: S3C6410 (armv6) and OMAP3530 (armv7). The results show that Swift achieves a speedup of 3.13 over the best-performing interpreter on the selected benchmarks. Compared with the state-of-the-art JIT compiler in Android, JITC-Droid, Swift achieves a speedup of 1.42.

代码质量和编译速度是JIT编译器面临的两大挑战，而选择性编译通常用于权衡这两个问题。同时，随着越来越多的Java应用程序在移动设备上运行，选择性编译遇到了许多问题。由于这些应用程序总是具有平坦的执行配置文件和较短的生存时间，因此非常需要一种不损失代码质量的轻量级JIT技术。然而，在嵌入式设备中，将基于堆栈的Java字节码编译为基于异构寄存器的机器码的开销非常大。本文提出了一种针对移动设备的快速有效的JIT技术，该技术基于一种更类似于底层机器架构的基于寄存器的Java字节码格式。通过对Java应用程序特性的全面研究，我们发现90%以上的Java方法使用的虚拟寄存器可以直接由11个物理寄存器实现。基于这一观察，本文提出了Swift，一种基于寄存器的字节码的新型JIT编译器，它为RISC机器生成本机代码。在将虚拟寄存器映射到物理寄存器之后，通过查找翻译表有效地生成代码。通过静态编译器生成基于寄存器的字节码，保证了代码质量。此外，我们还设计了两个轻量级优化和一个高效的代码卸载器，使Swift更适合嵌入式环境。随着Android的普及，Swift的原型是基于Android应用的官方发布格式DEX字节码实现的。Swift在两种不同的ARM soc上使用三个基准(SPECjvm98, EmbeddedCaffeineMark3和JemBench2)进行评估:S3C6410 (armv6)和OMAP3530 (armv7)。结果表明，在选定的基准测试中，Swift比性能最好的解释器实现了3.13的加速提升。与Android中最先进的JIT编译器JITC-Droid相比，Swift实现了1.42的加速提升。

{"title":"Swift: a register-based JIT compiler for embedded JVMs","authors":"Yuan Zhang, Min Yang, Bo Zhou, Zhemin Yang, Weihua Zhang, B. Zang","doi":"10.1145/2151024.2151035","DOIUrl":"https://doi.org/10.1145/2151024.2151035","url":null,"abstract":"Code quality and compilation speed are two challenges to JIT compilers, while selective compilation is commonly used to trade-off these two issues. Meanwhile, with more and more Java applications running in mobile devices, selective compilation meets many problems. Since these applications always have flat execution profile and short live time, a lightweight JIT technique without losing code quality is extremely needed. However, the overhead of compiling stack-based Java bytecode to heterogeneous register-based machine code is significant in embedded devices. This paper presents a fast and effective JIT technique for mobile devices, building on a register-based Java bytecode format which is more similar to the underlying machine architecture. Through a comprehensive study on the characteristics of Java applications, we observe that virtual registers used by more than 90% Java methods can be directly fulfilled by 11 physical registers. Based on this observation, this paper proposes Swift, a novel JIT compiler on register-based bytecode, which generates native code for RISC machines. After mapping virtual registers to physical registers, the code is generated efficiently by looking up a translation table. And the code quality is guaranteed by the static compiler which is used to generate register-based bytecode. Besides, we design two lightweight optimizations and an efficient code unloader to make Swift more suitable for embedded environment. As the prevalence of Android, a prototype of Swift is implemented upon DEX bytecode which is the official distribution format of Android applications.\u0000 Swift is evaluated with three benchmarks (SPECjvm98, EmbeddedCaffeineMark3 and JemBench2) on two different ARM SOCs: S3C6410 (armv6) and OMAP3530 (armv7). The results show that Swift achieves a speedup of 3.13 over the best-performing interpreter on the selected benchmarks. Compared with the state-of-the-art JIT compiler in Android, JITC-Droid, Swift achieves a speedup of 1.42.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121173215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

libdft: practical dynamic data flow tracking for commodity systems Libdft:用于商品系统的实用动态数据流跟踪

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151042

V. Kemerlis, G. Portokalidis, Kangkook Jee, A. Keromytis

Dynamic data flow tracking (DFT) deals with tagging and tracking data of interest as they propagate during program execution. DFT has been repeatedly implemented by a variety of tools for numerous purposes, including protection from zero-day and cross-site scripting attacks, detection and prevention of information leaks, and for the analysis of legitimate and malicious software. We present libdft, a dynamic DFT framework that unlike previous work is at once fast, reusable, and works with commodity software and hardware. libdft provides an API for building DFT-enabled tools that work on unmodified binaries, running on common operating systems and hardware, thus facilitating research and rapid prototyping. We explore different approaches for implementing the low-level aspects of instruction-level data tracking, introduce a more efficient and 64-bit capable shadow memory, and identify (and avoid) the common pitfalls responsible for the excessive performance overhead of previous studies. We evaluate libdft using real applications with large codebases like the Apache and MySQL servers, and the Firefox web browser. We also use a series of benchmarks and utilities to compare libdft with similar systems. Our results indicate that it performs at least as fast, if not faster, than previous solutions, and to the best of our knowledge, we are the first to evaluate the performance overhead of a fast dynamic DFT implementation in such depth. Finally, libdft is freely available as open source software.

动态数据流跟踪(DFT)处理在程序执行期间传播的感兴趣数据的标记和跟踪。DFT已被多种工具反复实现，用于多种目的，包括防止零日攻击和跨站点脚本攻击，检测和预防信息泄漏，以及分析合法和恶意软件。我们提出了libdft，这是一个动态DFT框架，与以前的工作不同，它既快速又可重用，并且可以使用商用软件和硬件。libdft提供了一个API，用于构建支持dft的工具，这些工具可以在未修改的二进制文件上工作，运行在通用的操作系统和硬件上，从而促进研究和快速原型设计。我们探索了实现指令级数据跟踪的低级方面的不同方法，引入了更高效且支持64位的影子内存，并识别(并避免)导致先前研究中过度性能开销的常见陷阱。我们使用具有大型代码库的实际应用程序(如Apache和MySQL服务器)以及Firefox web浏览器来评估libdft。我们还使用一系列基准测试和实用程序将libdft与类似的系统进行比较。我们的结果表明，它的执行速度至少与以前的解决方案一样快，如果不是更快的话，并且据我们所知，我们是第一个在这种深度上评估快速动态DFT实现的性能开销的人。最后，libdft是免费的开源软件。

{"title":"libdft: practical dynamic data flow tracking for commodity systems","authors":"V. Kemerlis, G. Portokalidis, Kangkook Jee, A. Keromytis","doi":"10.1145/2151024.2151042","DOIUrl":"https://doi.org/10.1145/2151024.2151042","url":null,"abstract":"Dynamic data flow tracking (DFT) deals with tagging and tracking data of interest as they propagate during program execution. DFT has been repeatedly implemented by a variety of tools for numerous purposes, including protection from zero-day and cross-site scripting attacks, detection and prevention of information leaks, and for the analysis of legitimate and malicious software. We present libdft, a dynamic DFT framework that unlike previous work is at once fast, reusable, and works with commodity software and hardware. libdft provides an API for building DFT-enabled tools that work on unmodified binaries, running on common operating systems and hardware, thus facilitating research and rapid prototyping. We explore different approaches for implementing the low-level aspects of instruction-level data tracking, introduce a more efficient and 64-bit capable shadow memory, and identify (and avoid) the common pitfalls responsible for the excessive performance overhead of previous studies. We evaluate libdft using real applications with large codebases like the Apache and MySQL servers, and the Firefox web browser. We also use a series of benchmarks and utilities to compare libdft with similar systems. Our results indicate that it performs at least as fast, if not faster, than previous solutions, and to the best of our knowledge, we are the first to evaluate the performance overhead of a fast dynamic DFT implementation in such depth. Finally, libdft is freely available as open source software.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121210061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

V2E: combining hardware virtualization and softwareemulation for transparent and extensible malware analysis V2E:结合硬件虚拟化和软件仿真，用于透明和可扩展的恶意软件分析

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151053

Lok K. Yan, Manjukumar Jayachandra, Mu Zhang, Heng Yin

A transparent and extensible malware analysis platform is essential for defeating malware. This platform should be transparent so malware cannot easily detect and bypass it. It should also be extensible to provide strong support for heavyweight instrumentation and analysis efficiency. However, no existing platform can meet both requirements. Leveraging hardware virtualization technology, analysis platforms like Ether can achieve good transparency, but its instrumentation support and analysis efficiency is poor. In contrast, software emulation provides strong support for code instrumentation and good analysis efficiency by using dynamic binary translation. However, analysis platforms based on software emulation can be easily detected by malware and thus is poor in transparency. To achieve both transparency and extensibility, we propose a new analysis platform that combines hardware virtualization and software emulation. The essence is precise heterogeneous replay: the malware execution is recorded via hardware virtualization and then replayed in software. Our design ensures the execution replay is precise. Moreover, with page-level recording granularity, the platform can easily adjust to analyze various forms of malware (a process, a kernel module, or a shared library). We implemented a prototype called V2E and demonstrated its capability and efficiency by conducting an extensive evaluation with both synthetic samples and 14 realworld emulation-resistant malware samples.

透明和可扩展的恶意软件分析平台对于击败恶意软件至关重要。这个平台应该是透明的，这样恶意软件就不能轻易地检测和绕过它。它还应该是可扩展的，以便为重量级仪器和分析效率提供强大的支持。然而，没有现有的平台可以同时满足这两种需求。利用硬件虚拟化技术，Ether等分析平台可以实现良好的透明性，但其仪器支持和分析效率较差。相比之下，软件仿真通过使用动态二进制转换为代码插装提供了强大的支持和良好的分析效率。然而，基于软件仿真的分析平台容易被恶意软件检测，透明度较差。为了实现透明性和可扩展性，我们提出了一种结合硬件虚拟化和软件仿真的分析平台。其本质是精确的异构重放:通过硬件虚拟化记录恶意软件的执行，然后在软件中重放。我们的设计确保执行回放是精确的。此外，通过页面级记录粒度，平台可以轻松调整以分析各种形式的恶意软件(进程、内核模块或共享库)。我们实现了一个名为V2E的原型，并通过对合成样本和14个真实世界的抗仿真恶意软件样本进行广泛评估，展示了其能力和效率。

{"title":"V2E: combining hardware virtualization and softwareemulation for transparent and extensible malware analysis","authors":"Lok K. Yan, Manjukumar Jayachandra, Mu Zhang, Heng Yin","doi":"10.1145/2151024.2151053","DOIUrl":"https://doi.org/10.1145/2151024.2151053","url":null,"abstract":"A transparent and extensible malware analysis platform is essential for defeating malware. This platform should be transparent so malware cannot easily detect and bypass it. It should also be extensible to provide strong support for heavyweight instrumentation and analysis efficiency. However, no existing platform can meet both requirements. Leveraging hardware virtualization technology, analysis platforms like Ether can achieve good transparency, but its instrumentation support and analysis efficiency is poor. In contrast, software emulation provides strong support for code instrumentation and good analysis efficiency by using dynamic binary translation. However, analysis platforms based on software emulation can be easily detected by malware and thus is poor in transparency. To achieve both transparency and extensibility, we propose a new analysis platform that combines hardware virtualization and software emulation. The essence is precise heterogeneous replay: the malware execution is recorded via hardware virtualization and then replayed in software. Our design ensures the execution replay is precise. Moreover, with page-level recording granularity, the platform can easily adjust to analyze various forms of malware (a process, a kernel module, or a shared library). We implemented a prototype called V2E and demonstrated its capability and efficiency by conducting an extensive evaluation with both synthetic samples and 14 realworld emulation-resistant malware samples.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114738879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

Challenges in building a real, large private cloud 构建真正的大型私有云的挑战

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151026

Evangelos Kotsovinos

Virtualization and internal cloud are often touted as the solution to many challenging problems, from resource underutilization to data-center optimization and carbon emission reduction. However, the hidden costs of cloud-scale virtualization, largely stemming from the complex and difficult system administration challenges it poses, are often overlooked. Reaping the fruits of internal Infrastructure as a Service cloud requires the enterprise to navigate scalability limitations, revamp traditional operational practices, manage performance, and achieve unprecedented cross-silo collaboration.

虚拟化和内部云经常被吹捧为许多具有挑战性问题的解决方案，从资源利用不足到数据中心优化和碳排放减少。然而，云规模虚拟化的隐藏成本常常被忽视，主要源于它所带来的复杂和困难的系统管理挑战。要收获内部基础设施即服务云的成果，企业需要克服可伸缩性限制，改进传统的操作实践，管理性能，并实现前所未有的跨竖井协作。

引用次数: 1

Protecting applications against TOCTTOU races by user-space caching of file metadata 通过用户空间缓存文件元数据来保护应用程序免受TOCTTOU竞争

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151052

Mathias Payer, T. Gross

Time Of Check To Time Of Use (TOCTTOU) race conditions for file accesses in user-space applications are a common problem in Unix-like systems. The mapping between filename and inode and device is volatile and can provide the necessary preconditions for an exploit. Applications use filenames as the primary attribute to identify files but the mapping between filenames and inode and device can be changed by an attacker. DynaRace is an approach that protects unmodified applications from file-based TOCTTOU race conditions. DynaRace uses a transparent mapping cache that keeps additional state and metadata for each accessed file in the application. The combination of file state and the current system call type are used to decide if (i) the metadata is updated or (ii) the correctness of the metadata is enforced between consecutive system calls. DynaRace uses user-mode path resolution internally to resolve individual file atoms. Each file atom is verified or updated according to the associated state in the mapping cache. More specifically, DynaRace protects against race conditions for all file-based system calls, by replacing the unsafe system calls with a set of safe system calls that utilize the mapping cache. The system call is executed only if the state transition is allowed and the information in the mapping cache matches. DynaRace deterministically solves the problem of file-based race conditions for unmodified applications and removes an attacker's ability to exploit the TOCTTOU race condition. DynaRace detects injected alternate inode and device pairs and terminates the application.

用户空间应用程序中文件访问的检查时间到使用时间(TOCTTOU)竞争条件是类unix系统中的一个常见问题。文件名和inode与设备之间的映射是不稳定的，可以为漏洞利用提供必要的前提条件。应用程序使用文件名作为识别文件的主要属性，但是文件名与inode和设备之间的映射可以被攻击者更改。DynaRace是一种保护未修改的应用程序不受基于文件的TOCTTOU竞争条件影响的方法。DynaRace使用透明映射缓存，为应用程序中每个被访问的文件保留额外的状态和元数据。文件状态和当前系统调用类型的组合用于决定是否(i)更新元数据或(ii)在连续的系统调用之间强制执行元数据的正确性。DynaRace在内部使用用户模式路径解析来解析单个文件原子。根据映射缓存中的相关状态验证或更新每个文件原子。更具体地说，DynaRace通过使用一组利用映射缓存的安全系统调用来替换不安全的系统调用，从而防止所有基于文件的系统调用出现竞争条件。只有当允许状态转换并且映射缓存中的信息匹配时，才执行系统调用。DynaRace确定地解决了未修改应用程序的基于文件的竞争条件问题，并消除了攻击者利用TOCTTOU竞争条件的能力。DynaRace检测注入的备用索引节点和设备对，并终止应用程序。

{"title":"Protecting applications against TOCTTOU races by user-space caching of file metadata","authors":"Mathias Payer, T. Gross","doi":"10.1145/2151024.2151052","DOIUrl":"https://doi.org/10.1145/2151024.2151052","url":null,"abstract":"Time Of Check To Time Of Use (TOCTTOU) race conditions for file accesses in user-space applications are a common problem in Unix-like systems. The mapping between filename and inode and device is volatile and can provide the necessary preconditions for an exploit. Applications use filenames as the primary attribute to identify files but the mapping between filenames and inode and device can be changed by an attacker.\u0000 DynaRace is an approach that protects unmodified applications from file-based TOCTTOU race conditions. DynaRace uses a transparent mapping cache that keeps additional state and metadata for each accessed file in the application. The combination of file state and the current system call type are used to decide if (i) the metadata is updated or (ii) the correctness of the metadata is enforced between consecutive system calls.\u0000 DynaRace uses user-mode path resolution internally to resolve individual file atoms. Each file atom is verified or updated according to the associated state in the mapping cache. More specifically, DynaRace protects against race conditions for all file-based system calls, by replacing the unsafe system calls with a set of safe system calls that utilize the mapping cache. The system call is executed only if the state transition is allowed and the information in the mapping cache matches.\u0000 DynaRace deterministically solves the problem of file-based race conditions for unmodified applications and removes an attacker's ability to exploit the TOCTTOU race condition. DynaRace detects injected alternate inode and device pairs and terminates the application.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123406933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Adding dynamically-typed language support to a statically-typed language compiler: performance evaluation, analysis, and tradeoffs 向静态类型语言编译器添加动态类型语言支持:性能评估、分析和权衡

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151047

K. Ishizaki, T. Ogasawara, J. Castaños, P. Nagpurkar, D. Edelsohn, T. Nakatani

Applications written in dynamically typed scripting languages are increasingly popular for Web software development. Even on the server side, programmers are using dynamically typed scripting languages such as Ruby and Python to build complex applications quickly. As the number and complexity of dynamically typed scripting language applications grows, optimizing their performance is becoming important. Some of the best performing compilers and optimizers for dynamically typed scripting languages are developed entirely from scratch and target a specific language. This approach is not scalable, given the variety of dynamically typed scripting languages, and the effort involved in developing and maintaining separate infrastructures for each. In this paper, we evaluate the feasibility of adapting and extending an existing production-quality method-based Just-In-Time (JIT) compiler for a language with dynamic types. Our goal is to identify the challenges and shortcomings with the current infrastructure, and to propose and evaluate runtime techniques and optimizations that can be incorporated into a common optimization infrastructure for static and dynamic languages. We discuss three extensions to the compiler to support dynamically typed languages: (1) simplification of control flow graphs, (2) mapping of memory locations to stack-allocated variables, and (3) reduction of runtime overhead using language semantics. We also propose four new optimizations for Python in (2) and (3). These extensions are effective in reduction of compiler working memory and improvement of runtime performance. We present a detailed performance evaluation of our approach for Python, finding an overall improvement of 1.69x on average (up to 2.74x) over our JIT compiler without any optimization for dynamically typed languages and Python.

用动态类型脚本语言编写的应用程序在Web软件开发中越来越流行。即使在服务器端，程序员也在使用动态类型脚本语言(如Ruby和Python)快速构建复杂的应用程序。随着动态类型脚本语言应用程序的数量和复杂性的增长，优化它们的性能变得越来越重要。对于动态类型脚本语言，一些性能最好的编译器和优化器是完全从零开始开发的，并且针对特定的语言。考虑到动态类型脚本语言的多样性，以及为每种语言开发和维护单独的基础设施所涉及的工作量，这种方法是不可伸缩的。在本文中，我们评估了适应和扩展现有的基于生产质量方法的实时(JIT)编译器用于动态类型语言的可行性。我们的目标是确定当前基础设施的挑战和缺点，并提出和评估可以合并到静态和动态语言的公共优化基础设施中的运行时技术和优化。我们讨论了编译器的三个扩展以支持动态类型语言:(1)简化控制流图，(2)将内存位置映射到堆栈分配的变量，以及(3)使用语言语义减少运行时开销。我们还在(2)和(3)中为Python提出了四个新的优化。这些扩展在减少编译器工作内存和提高运行时性能方面是有效的。我们对我们的Python方法进行了详细的性能评估，发现在没有对动态类型语言和Python进行任何优化的情况下，与JIT编译器相比，我们的方法平均提高了1.69x(最高可达2.74x)。

{"title":"Adding dynamically-typed language support to a statically-typed language compiler: performance evaluation, analysis, and tradeoffs","authors":"K. Ishizaki, T. Ogasawara, J. Castaños, P. Nagpurkar, D. Edelsohn, T. Nakatani","doi":"10.1145/2151024.2151047","DOIUrl":"https://doi.org/10.1145/2151024.2151047","url":null,"abstract":"Applications written in dynamically typed scripting languages are increasingly popular for Web software development. Even on the server side, programmers are using dynamically typed scripting languages such as Ruby and Python to build complex applications quickly. As the number and complexity of dynamically typed scripting language applications grows, optimizing their performance is becoming important. Some of the best performing compilers and optimizers for dynamically typed scripting languages are developed entirely from scratch and target a specific language. This approach is not scalable, given the variety of dynamically typed scripting languages, and the effort involved in developing and maintaining separate infrastructures for each. In this paper, we evaluate the feasibility of adapting and extending an existing production-quality method-based Just-In-Time (JIT) compiler for a language with dynamic types. Our goal is to identify the challenges and shortcomings with the current infrastructure, and to propose and evaluate runtime techniques and optimizations that can be incorporated into a common optimization infrastructure for static and dynamic languages. We discuss three extensions to the compiler to support dynamically typed languages: (1) simplification of control flow graphs, (2) mapping of memory locations to stack-allocated variables, and (3) reduction of runtime overhead using language semantics. We also propose four new optimizations for Python in (2) and (3). These extensions are effective in reduction of compiler working memory and improvement of runtime performance. We present a detailed performance evaluation of our approach for Python, finding an overall improvement of 1.69x on average (up to 2.74x) over our JIT compiler without any optimization for dynamically typed languages and Python.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130653102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Virtualization challenges: a view from server consolidation perspective 虚拟化挑战:从服务器整合的角度来看

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151030

Hui Lv, Yaozu Dong, Jiangang Duan, Kevin Tian

Server consolidation, by running multiple virtual machines on top of a single platform with virtualization, provides an efficient solu-tion to parallelism and utilization of modern multi-core processors system. However, the performance and scalability of server con-solidation solution on modern massive advanced server is not well addressed. In this paper, we conduct a comprehensive study of Xen per-formance and scalability characterization running SPECvirt_sc2010, and identify that large memory and cache footprint, due to the unnecessary high frequent context switch, introduce additional challenges to the system performance and scalability. We propose two optimizations (dynamically-allocable tasklets and context-switch rate controller) to improve the performance. The results show the improved memory and cache efficiency with a reduction of the overall CPI, resulting in an improvement of server consolidation capability by 15% in SPECvirt_sc2010. In the meantime, our optimization achieves an up to 50% acceleration of service response, which greatly improves the QoS of Xen virtualization solution.

服务器整合通过在具有虚拟化的单个平台上运行多个虚拟机，为现代多核处理器系统的并行性和利用率提供了有效的解决方案。然而，在现代大型高级服务器上，服务器整合解决方案的性能和可扩展性并没有得到很好的解决。在本文中，我们对运行SPECvirt_sc2010的Xen性能和可伸缩性特性进行了全面的研究，并确定由于不必要的频繁上下文切换，大量内存和缓存占用给系统性能和可伸缩性带来了额外的挑战。我们提出了两种优化方法(动态可分配微线程和上下文切换速率控制器)来提高性能。结果表明，通过降低总体CPI，提高了内存和缓存效率，从而使SPECvirt_sc2010中的服务器整合能力提高了15%。同时，我们的优化实现了高达50%的服务响应加速，极大地提高了Xen虚拟化解决方案的QoS。

引用次数: 24

SimTester: a controllable and observable testing framework for embedded systems SimTester:用于嵌入式系统的可控和可观察的测试框架

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151034

Tingting Yu, W. Srisa-an, G. Rothermel

In software for embedded systems, the frequent use of interrupts for timing, sensing, and I/O processing can cause concurrency faults to occur due to interactions between applications, device drivers, and interrupt handlers. This type of fault is considered by many practitioners to be among the most difficult to detect, isolate, and correct, in part because it can be sensitive to execution interleavings and often occurs without leaving any observable incorrect output. As such, commonly used testing techniques that inspect program outputs to detect failures are often ineffective at detecting them. To test for these concurrency faults, test engineers need to be able to control interleavings so that they are deterministic. Furthermore, they also need to be able to observe faults as they occur instead of relying on observable incorrect outputs. In this paper, we introduce SimTester, a framework that allows engineers to effectively test for subtle and non-deterministic concurrency faults by providing them with greater controllability and observability. We implemented our framework on a commercial virtual platform that is widely used to support hardware/software co-designs to promote ease of adoption. We then evaluated its effectiveness by using it to test for data races and deadlocks. The result shows that our framework can be effective and efficient at detecting these faults.

在用于嵌入式系统的软件中，频繁地使用中断进行计时、感知和I/O处理，可能会由于应用程序、设备驱动程序和中断处理程序之间的交互而导致并发错误。许多从业者认为这种类型的错误是最难检测、隔离和纠正的，部分原因是它对执行交错很敏感，并且经常发生时不会留下任何可观察到的错误输出。因此，通常使用的检查程序输出以检测故障的测试技术通常在检测故障时是无效的。为了测试这些并发性错误，测试工程师需要能够控制交错，以便它们是确定的。此外，它们还需要能够在故障发生时观察到故障，而不是依赖于可观察到的错误输出。在本文中，我们介绍SimTester，这是一个框架，它允许工程师通过提供更好的可控性和可观察性来有效地测试细微的和不确定的并发错误。我们在一个商业虚拟平台上实现了我们的框架，该平台被广泛用于支持硬件/软件协同设计，以促进易于采用。然后，我们通过使用它来测试数据竞争和死锁来评估它的有效性。结果表明，该框架能够有效地检测出这些故障。

{"title":"SimTester: a controllable and observable testing framework for embedded systems","authors":"Tingting Yu, W. Srisa-an, G. Rothermel","doi":"10.1145/2151024.2151034","DOIUrl":"https://doi.org/10.1145/2151024.2151034","url":null,"abstract":"In software for embedded systems, the frequent use of interrupts for timing, sensing, and I/O processing can cause concurrency faults to occur due to interactions between applications, device drivers, and interrupt handlers. This type of fault is considered by many practitioners to be among the most difficult to detect, isolate, and correct, in part because it can be sensitive to execution interleavings and often occurs without leaving any observable incorrect output. As such, commonly used testing techniques that inspect program outputs to detect failures are often ineffective at detecting them. To test for these concurrency faults, test engineers need to be able to control interleavings so that they are deterministic. Furthermore, they also need to be able to observe faults as they occur instead of relying on observable incorrect outputs.\u0000 In this paper, we introduce SimTester, a framework that allows engineers to effectively test for subtle and non-deterministic concurrency faults by providing them with greater controllability and observability. We implemented our framework on a commercial virtual platform that is widely used to support hardware/software co-designs to promote ease of adoption. We then evaluated its effectiveness by using it to test for data races and deadlocks. The result shows that our framework can be effective and efficient at detecting these faults.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

SecondSite: disaster tolerance as a service SecondSite:容灾即服务

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151039

Shriram Rajagopalan, Brendan Cully, R. O'Connor, A. Warfield

This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability system by allowing groups of virtual machines to be replicated across data centers over wide-area Internet links. The goal of the system is to commodify the property of availability, exposing it as a simple tick box when configuring a new virtual machine. To achieve this in the wide area, we have had to tackle the related issues of replication traffic bandwidth, reliable failure detection across geographic regions and traffic redirection over a wide-area network without compromising on transparency and consistency.

本文描述了基于云的容灾服务SecondSite的设计和实现。SecondSite扩展了基于Remus虚拟化的高可用性系统，允许通过广域Internet链接跨数据中心复制虚拟机组。该系统的目标是将可用性属性商品化，在配置新虚拟机时将其作为一个简单的复选框公开。为了在广域实现这一点，我们必须在不影响透明度和一致性的情况下，解决复制流量带宽、跨地理区域可靠的故障检测和广域网络上的流量重定向等相关问题。

引用次数: 68

DVM: towards a datacenter-scale virtual machine DVM:迈向数据中心规模的虚拟机

International Conference on Virtual Execution Environments

Pub Date : 2012-03-03 DOI: 10.1145/2151024.2151032

Zhiqiang Ma, Zhonghua Sheng, Lin Gu, Liufei Wen, Gong Zhang

As cloud-based computation becomes increasingly important, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud systems use existing virtual machine monitor (VMM) technologies, such as Xen, VMware, and Windows Hypervisor, to multiplex a physical host into multiple virtual hosts and isolate computation on the shared cluster platform. However, traditional multiplexing VMMs do not scale beyond one single physical host, and it alone cannot provide the programming interface and cluster-wide computation that a datacenter system requires. We design a new instruction set architecture, DISA, to unify myriads of compute nodes to form a big virtual machine called DVM, and present programmers the view of a single computer where thousands of tasks run concurrently in a large, unified, and snapshotted memory space. The DVM provides a simple yet scalable programming model and mitigates the scalability bottleneck of traditional distributed shared memory systems. Along with an efficient execution engine, the capacity of a DVM can scale up to support large clusters. We have implemented and tested DVM on three platforms, and our evaluation shows that DVM has excellent performance in terms of execution time and speedup. On one physical host, the system overhead of DVM is comparable to that of traditional VMMs. On 16 physical hosts, the DVM runs 10 times faster than MapReduce/Hadoop and X10. On 256 EC2 instances, DVM shows linear speedup on a parallelizable workload.

随着基于云计算的计算变得越来越重要，提供一个通用的计算接口来支持数据中心规模的编程已经成为一个迫切的研究议程。许多云系统使用现有的虚拟机监视器(VMM)技术，如Xen、VMware和Windows Hypervisor，将物理主机多路复用为多个虚拟主机，并在共享集群平台上隔离计算。然而，传统的多路复用vmm不能扩展到单个物理主机之外，而且它本身不能提供数据中心系统所需的编程接口和集群范围的计算。我们设计了一种新的指令集体系结构DISA，将无数计算节点统一成一个名为DVM的大型虚拟机，并向程序员展示了在一个大的、统一的、快照的内存空间中同时运行数千个任务的单个计算机的视图。DVM提供了一个简单但可扩展的编程模型，减轻了传统分布式共享内存系统的可伸缩性瓶颈。通过高效的执行引擎，DVM的容量可以扩展到支持大型集群。我们已经在三个平台上对DVM进行了实现和测试，我们的评估表明，DVM在执行时间和加速方面具有出色的性能。在一台物理主机上，DVM的系统开销与传统vmm相当。在16台物理主机上，DVM的运行速度比MapReduce/Hadoop和X10快10倍。在256个EC2实例上，DVM在可并行工作负载上显示线性加速。

{"title":"DVM: towards a datacenter-scale virtual machine","authors":"Zhiqiang Ma, Zhonghua Sheng, Lin Gu, Liufei Wen, Gong Zhang","doi":"10.1145/2151024.2151032","DOIUrl":"https://doi.org/10.1145/2151024.2151032","url":null,"abstract":"As cloud-based computation becomes increasingly important, providing a general computational interface to support datacenter-scale programming has become an imperative research agenda. Many cloud systems use existing virtual machine monitor (VMM) technologies, such as Xen, VMware, and Windows Hypervisor, to multiplex a physical host into multiple virtual hosts and isolate computation on the shared cluster platform. However, traditional multiplexing VMMs do not scale beyond one single physical host, and it alone cannot provide the programming interface and cluster-wide computation that a datacenter system requires. We design a new instruction set architecture, DISA, to unify myriads of compute nodes to form a big virtual machine called DVM, and present programmers the view of a single computer where thousands of tasks run concurrently in a large, unified, and snapshotted memory space. The DVM provides a simple yet scalable programming model and mitigates the scalability bottleneck of traditional distributed shared memory systems. Along with an efficient execution engine, the capacity of a DVM can scale up to support large clusters. We have implemented and tested DVM on three platforms, and our evaluation shows that DVM has excellent performance in terms of execution time and speedup. On one physical host, the system overhead of DVM is comparable to that of traditional VMMs. On 16 physical hosts, the DVM runs 10 times faster than MapReduce/Hadoop and X10. On 256 EC2 instances, DVM shows linear speedup on a parallelizable workload.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131709431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7