Achieving predictable performance in shared cloud storage services is hard. Tenants want reservations in terms of system-wide application-level throughput, but the provider must ultimately deal with low-level IO resources at each storage node where contention arises. Such a guarantee has thus proven elusive, due to the complexities inherent to modern storage stacks: non-uniform IO amplification, unpredictable IO interference, and non-linear IO performance. This paper presents Libra, a local IO scheduling framework designed for a shared SSD-backed key-value storage system. Libra guarantees per-tenant application-request throughput while achieving high utilization. To accomplish this, Libra leverages two techniques. First, Libra tracks the IO resource consumption of a tenant's application-level requests across complex storage stack interactions, down to low-level IO operations. This allows Libra to allocate per-tenant IO resources for achieving app-request reservations based on their dynamic IO usage profile. Second, Libra uses a disk-IO cost model based on virtual IO operations (VOP) that captures the non-linear relationship between SSD IO bandwidth and IO operation (IOP) throughput. Using VOPs, Libra can both account for the true cost of an IOP and determine the amount of provisionable IO resources available under IO interference. An evaluation shows that Libra, when applied to a LevelDB-based prototype with SSD-backed storage, satisfies tenant app-request reservations and achieves accurate low-level VOP allocations over a range of workloads, while still supporting high utilization.
{"title":"From application requests to virtual IOPs: provisioned key-value storage with Libra","authors":"David Shue, M. Freedman","doi":"10.1145/2592798.2592823","DOIUrl":"https://doi.org/10.1145/2592798.2592823","url":null,"abstract":"Achieving predictable performance in shared cloud storage services is hard. Tenants want reservations in terms of system-wide application-level throughput, but the provider must ultimately deal with low-level IO resources at each storage node where contention arises. Such a guarantee has thus proven elusive, due to the complexities inherent to modern storage stacks: non-uniform IO amplification, unpredictable IO interference, and non-linear IO performance.\u0000 This paper presents Libra, a local IO scheduling framework designed for a shared SSD-backed key-value storage system. Libra guarantees per-tenant application-request throughput while achieving high utilization. To accomplish this, Libra leverages two techniques. First, Libra tracks the IO resource consumption of a tenant's application-level requests across complex storage stack interactions, down to low-level IO operations. This allows Libra to allocate per-tenant IO resources for achieving app-request reservations based on their dynamic IO usage profile. Second, Libra uses a disk-IO cost model based on virtual IO operations (VOP) that captures the non-linear relationship between SSD IO bandwidth and IO operation (IOP) throughput. Using VOPs, Libra can both account for the true cost of an IOP and determine the amount of provisionable IO resources available under IO interference.\u0000 An evaluation shows that Libra, when applied to a LevelDB-based prototype with SSD-backed storage, satisfies tenant app-request reservations and achieves accurate low-level VOP allocations over a range of workloads, while still supporting high utilization.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"67 1","pages":"17:1-17:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83144610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is increasingly important for parallel applications to run together on the same machine. However, current performance is often poor: programs do not adapt well to dynamically varying numbers of cores, and the CPU time received by concurrent jobs can differ drastically. This paper introduces Callisto, a resource management layer for parallel runtime systems. We describe Callisto and the implementation of two Callisto-enabled runtime systems---one for OpenMP, and another for a task-parallel programming model. We show how Callisto eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores. We use examples from two recent graph analytics projects and from SPEC OMP.
{"title":"Callisto: co-scheduling parallel runtime systems","authors":"T. Harris, Martin Maas, Virendra J. Marathe","doi":"10.1145/2592798.2592807","DOIUrl":"https://doi.org/10.1145/2592798.2592807","url":null,"abstract":"It is increasingly important for parallel applications to run together on the same machine. However, current performance is often poor: programs do not adapt well to dynamically varying numbers of cores, and the CPU time received by concurrent jobs can differ drastically. This paper introduces Callisto, a resource management layer for parallel runtime systems. We describe Callisto and the implementation of two Callisto-enabled runtime systems---one for OpenMP, and another for a task-parallel programming model. We show how Callisto eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores. We use examples from two recent graph analytics projects and from SPEC OMP.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"3 1","pages":"24:1-24:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90624667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Planet-scale video Content Delivery Networks (CDNs) deliver a significant fraction of the entire Internet traffic. Effective caching at the edge is vital for the feasibility of these CDNs, which can otherwise incur significant monetary costs and resource overloads in the Internet. We analyze the challenges and requirements for video caching on these CDNs which cannot be addressed by standard solutions. We develop multiple algorithms for caching in these CDNs: (i) An LRU-based baseline solution to address the requirements, (ii) an intelligent ingress-efficient algorithm, (iii) an offline cache aware of future requests (greedy) to estimate the maximum caching efficiency we can expect from any online algorithm, and (iv) an optimal offline cache (for limited scales). We use anonymized actual data from a large-scale, global CDN to evaluate the algorithms and draw conclusions on their suitability for different settings.
{"title":"Caching in video CDNs: building strong lines of defense","authors":"Kianoosh Mokhtarian, H. Jacobsen","doi":"10.1145/2592798.2592817","DOIUrl":"https://doi.org/10.1145/2592798.2592817","url":null,"abstract":"Planet-scale video Content Delivery Networks (CDNs) deliver a significant fraction of the entire Internet traffic. Effective caching at the edge is vital for the feasibility of these CDNs, which can otherwise incur significant monetary costs and resource overloads in the Internet.\u0000 We analyze the challenges and requirements for video caching on these CDNs which cannot be addressed by standard solutions. We develop multiple algorithms for caching in these CDNs: (i) An LRU-based baseline solution to address the requirements, (ii) an intelligent ingress-efficient algorithm, (iii) an offline cache aware of future requests (greedy) to estimate the maximum caching efficiency we can expect from any online algorithm, and (iv) an optimal offline cache (for limited scales). We use anonymized actual data from a large-scale, global CDN to evaluate the algorithms and draw conclusions on their suitability for different settings.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"17 1","pages":"13:1-13:13"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83497093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Koeberl, Steffen Schulz, A. Sadeghi, V. Varadharajan
Embedded systems are increasingly pervasive, interdependent and in many cases critical to our every day life and safety. Tiny devices that cannot afford sophisticated hardware security mechanisms are embedded in complex control infrastructures, medical support systems and entertainment products [51]. As such devices are increasingly subject to attacks, new hardware protection mechanisms are needed to provide the required resilience and dependency at low cost. In this work, we present the TrustLite security architecture for flexible, hardware-enforced isolation of software modules. We describe mechanisms for secure exception handling and communication between protected modules, enabling seamless interoperability with untrusted operating systems and tasks. TrustLite scales from providing a simple protected firmware runtime to advanced functionality such as attestation and trusted execution of userspace tasks. Our FPGA prototype shows that these capabilities are achievable even on low-cost embedded systems.
{"title":"TrustLite: a security architecture for tiny embedded devices","authors":"Patrick Koeberl, Steffen Schulz, A. Sadeghi, V. Varadharajan","doi":"10.1145/2592798.2592824","DOIUrl":"https://doi.org/10.1145/2592798.2592824","url":null,"abstract":"Embedded systems are increasingly pervasive, interdependent and in many cases critical to our every day life and safety. Tiny devices that cannot afford sophisticated hardware security mechanisms are embedded in complex control infrastructures, medical support systems and entertainment products [51]. As such devices are increasingly subject to attacks, new hardware protection mechanisms are needed to provide the required resilience and dependency at low cost.\u0000 In this work, we present the TrustLite security architecture for flexible, hardware-enforced isolation of software modules. We describe mechanisms for secure exception handling and communication between protected modules, enabling seamless interoperability with untrusted operating systems and tasks. TrustLite scales from providing a simple protected firmware runtime to advanced functionality such as attestation and trusted execution of userspace tasks. Our FPGA prototype shows that these capabilities are achievable even on low-cost embedded systems.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"29 1","pages":"10:1-10:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76013270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent availability of Intel Haswell processors marks the transition of hardware transactional memory from research toys to mainstream reality. DBX is an in-memory database that uses Intel's restricted transactional memory (RTM) to achieve high performance and good scalability across multi-core machines. The main limitation (and also key to practicality) of RTM is its constrained working set size: an RTM region that reads or writes too much data will always be aborted. The design of DBX addresses this challenge in several ways. First, DBX builds a database transaction layer on top of an underlying shared-memory store. The two layers use separate RTM regions to synchronize shared memory access. Second, DBX uses optimistic concurrency control to separate transaction execution from its commit. Only the commit stage uses RTM for synchronization. As a result, the working set of the RTMs used scales with the meta-data of reads and writes in a database transaction as opposed to the amount of data read/written. Our evaluation using TPC-C workload mix shows that DBX achieves 506,817 transactions per second on a 4-core machine.
{"title":"Using restricted transactional memory to build a scalable in-memory database","authors":"Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen","doi":"10.1145/2592798.2592815","DOIUrl":"https://doi.org/10.1145/2592798.2592815","url":null,"abstract":"The recent availability of Intel Haswell processors marks the transition of hardware transactional memory from research toys to mainstream reality. DBX is an in-memory database that uses Intel's restricted transactional memory (RTM) to achieve high performance and good scalability across multi-core machines. The main limitation (and also key to practicality) of RTM is its constrained working set size: an RTM region that reads or writes too much data will always be aborted. The design of DBX addresses this challenge in several ways. First, DBX builds a database transaction layer on top of an underlying shared-memory store. The two layers use separate RTM regions to synchronize shared memory access. Second, DBX uses optimistic concurrency control to separate transaction execution from its commit. Only the commit stage uses RTM for synchronization. As a result, the working set of the RTMs used scales with the meta-data of reads and writes in a database transaction as opposed to the amount of data read/written. Our evaluation using TPC-C workload mix shows that DBX achieves 506,817 transactions per second on a 4-core machine.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"67 1","pages":"26:1-26:15"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72943912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bhushan Jain, Chia-che Tsai, J. John, Donald E. Porter
Trusted, setuid-to-root binaries have been a substantial, long-lived source of privilege escalation vulnerabilities on Unix systems. Prior work on limiting privilege escalation has only considered privilege from the perspective of the administrator, neglecting the perspective of regular users---the primary reason for having setuid-to-root binaries. The paper presents a study of the current state of setuid-to-root binaries on Linux, focusing on the 28 most commonly deployed setuid binaries in the Debian and Ubuntu distributions. This study reveals several points where Linux kernel policies and abstractions are a poor fit for the policies desired by the administrator, and root privilege is used to create point solutions. The majority of these point solutions address 8 system calls that require administrator privilege, but also export functionality required by unprivileged users. This paper demonstrates how least privilege can be achieved on modern systems for non-administrator users. We identify the policies currently encoded in setuid-to-root binaries, and present a framework for expressing and enforcing these policy categories in the kernel. Our prototype, called Protego, deprivileges over 10,000 lines of code by changing only 715 lines of Linux kernel code. Protego also adds additional utilities to keep the kernel policy synchronized with legacy, policy-relevant configuration files, such as /etc/sudoers. Although some previously-privileged binaries may require changes, Protego provides users with the same functionality as Linux and introduces acceptable performance overheads. For instance, a Linux kernel compile incurs less than 2% overhead on Protego.
{"title":"Practical techniques to obviate setuid-to-root binaries","authors":"Bhushan Jain, Chia-che Tsai, J. John, Donald E. Porter","doi":"10.1145/2592798.2592811","DOIUrl":"https://doi.org/10.1145/2592798.2592811","url":null,"abstract":"Trusted, setuid-to-root binaries have been a substantial, long-lived source of privilege escalation vulnerabilities on Unix systems. Prior work on limiting privilege escalation has only considered privilege from the perspective of the administrator, neglecting the perspective of regular users---the primary reason for having setuid-to-root binaries.\u0000 The paper presents a study of the current state of setuid-to-root binaries on Linux, focusing on the 28 most commonly deployed setuid binaries in the Debian and Ubuntu distributions. This study reveals several points where Linux kernel policies and abstractions are a poor fit for the policies desired by the administrator, and root privilege is used to create point solutions. The majority of these point solutions address 8 system calls that require administrator privilege, but also export functionality required by unprivileged users.\u0000 This paper demonstrates how least privilege can be achieved on modern systems for non-administrator users. We identify the policies currently encoded in setuid-to-root binaries, and present a framework for expressing and enforcing these policy categories in the kernel. Our prototype, called Protego, deprivileges over 10,000 lines of code by changing only 715 lines of Linux kernel code. Protego also adds additional utilities to keep the kernel policy synchronized with legacy, policy-relevant configuration files, such as /etc/sudoers. Although some previously-privileged binaries may require changes, Protego provides users with the same functionality as Linux and introduces acceptable performance overheads. For instance, a Linux kernel compile incurs less than 2% overhead on Protego.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"24 1","pages":"8:1-8:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74086498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile apps bring unprecedented levels of convenience, yet they are often buggy, and their bugs offset the convenience the apps bring. A key reason for buggy apps is that they must handle a vast variety of system and user actions such as being randomly killed by the OS to save resources, but app developers, facing tough competitions, lack time to thoroughly test these actions. AppDoctor is a system for efficiently and effectively testing apps against many system and user actions, and helping developers diagnose the resultant bug reports. It quickly screens for potential bugs using approximate execution, which runs much faster than real execution and exposes bugs but may cause false positives. From the reports, AppDoctor automatically verifies most bugs and prunes most false positives, greatly saving manual inspection effort. It uses action slicing to further speed up bug diagnosis. We implement AppDoctor in Android. It operates as a cloud of physical devices or emulators to scale up testing. Evaluation on 53 out of 100 most popular apps in Google Play and 11 of the most popular open-source apps shows that, AppDoctor effectively detects 72 bugs---including two bugs in the Android framework that affect all apps---with quick checking sessions, speeds up testing by 13.3 times, and vastly reduces diagnosis effort.
{"title":"Efficiently, effectively detecting mobile app bugs with AppDoctor","authors":"Gang Hu, Xinhao Yuan, Yang Tang, Junfeng Yang","doi":"10.1145/2592798.2592813","DOIUrl":"https://doi.org/10.1145/2592798.2592813","url":null,"abstract":"Mobile apps bring unprecedented levels of convenience, yet they are often buggy, and their bugs offset the convenience the apps bring. A key reason for buggy apps is that they must handle a vast variety of system and user actions such as being randomly killed by the OS to save resources, but app developers, facing tough competitions, lack time to thoroughly test these actions. AppDoctor is a system for efficiently and effectively testing apps against many system and user actions, and helping developers diagnose the resultant bug reports. It quickly screens for potential bugs using approximate execution, which runs much faster than real execution and exposes bugs but may cause false positives. From the reports, AppDoctor automatically verifies most bugs and prunes most false positives, greatly saving manual inspection effort. It uses action slicing to further speed up bug diagnosis. We implement AppDoctor in Android. It operates as a cloud of physical devices or emulators to scale up testing. Evaluation on 53 out of 100 most popular apps in Google Play and 11 of the most popular open-source apps shows that, AppDoctor effectively detects 72 bugs---including two bugs in the Android framework that affect all apps---with quick checking sessions, speeds up testing by 13.3 times, and vastly reduces diagnosis effort.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"59 1","pages":"18:1-18:15"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88899107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chia-che Tsai, Kumar Saurabh Arora, N. Bandi, Bhushan Jain, William Jannen, J. John, Harry A. Kalodner, Vrushali Kulkarni, Daniela Oliveira, Donald E. Porter
Library OSes are a promising approach for applications to efficiently obtain the benefits of virtual machines, including security isolation, host platform compatibility, and migration. Library OSes refactor a traditional OS kernel into an application library, avoiding overheads incurred by duplicate functionality. When compared to running a single application on an OS kernel in a VM, recent library OSes reduce the memory footprint by an order-of-magnitude. Previous library OS (libOS) research has focused on single-process applications, yet many Unix applications, such as network servers and shell scripts, span multiple processes. Key design challenges for a multi-process libOS include management of shared state and minimal expansion of the security isolation boundary. This paper presents Graphene, a library OS that seamlessly and efficiently executes both single and multi-process applications, generally with low memory and performance overheads. Graphene broadens the libOS paradigm to support secure, multi-process APIs, such as copy-on-write fork, signals, and System V IPC. Multiple libOS instances coordinate over pipe-like byte streams to implement a consistent, distributed POSIX abstraction. These coordination streams provide a simple vantage point to enforce security isolation.
库操作系统是一种很有前途的方法,可以让应用程序有效地获得虚拟机的好处,包括安全隔离、主机平台兼容性和迁移。库操作系统将传统的操作系统内核重构为应用程序库,避免了重复功能带来的开销。与在虚拟机的操作系统内核上运行单个应用程序相比,最新的库操作系统将内存占用减少了一个数量级。以前的库操作系统(libOS)研究主要集中在单进程应用程序上,但是许多Unix应用程序,如网络服务器和shell脚本,都是跨多个进程的。多进程libo的主要设计挑战包括共享状态的管理和安全隔离边界的最小扩展。本文介绍了石墨烯,这是一个库操作系统,可以无缝高效地执行单进程和多进程应用程序,通常具有较低的内存和性能开销。石墨烯拓宽了libOS范例,以支持安全的多进程api,如写时复制(copy-on-write) fork、信号和System V IPC。多个libOS实例在类似管道的字节流上进行协调,以实现一致的分布式POSIX抽象。这些协调流提供了一个简单的有利位置来实施安全隔离。
{"title":"Cooperation and security isolation of library OSes for multi-process applications","authors":"Chia-che Tsai, Kumar Saurabh Arora, N. Bandi, Bhushan Jain, William Jannen, J. John, Harry A. Kalodner, Vrushali Kulkarni, Daniela Oliveira, Donald E. Porter","doi":"10.1145/2592798.2592812","DOIUrl":"https://doi.org/10.1145/2592798.2592812","url":null,"abstract":"Library OSes are a promising approach for applications to efficiently obtain the benefits of virtual machines, including security isolation, host platform compatibility, and migration. Library OSes refactor a traditional OS kernel into an application library, avoiding overheads incurred by duplicate functionality. When compared to running a single application on an OS kernel in a VM, recent library OSes reduce the memory footprint by an order-of-magnitude.\u0000 Previous library OS (libOS) research has focused on single-process applications, yet many Unix applications, such as network servers and shell scripts, span multiple processes. Key design challenges for a multi-process libOS include management of shared state and minimal expansion of the security isolation boundary.\u0000 This paper presents Graphene, a library OS that seamlessly and efficiently executes both single and multi-process applications, generally with low memory and performance overheads. Graphene broadens the libOS paradigm to support secure, multi-process APIs, such as copy-on-write fork, signals, and System V IPC. Multiple libOS instances coordinate over pipe-like byte streams to implement a consistent, distributed POSIX abstraction. These coordination streams provide a simple vantage point to enforce security isolation.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"4 1","pages":"9:1-9:14"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88726662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Anand, M. Smithson, Khaled Elwazeer, A. Kotha, Jim Gruen, Nathan Giles, R. Barua
This paper presents component techniques essential for converting executables to a high-level intermediate representation (IR) of an existing compiler. The compiler IR is then employed for three distinct applications: binary rewriting using the compiler's binary back-end, vulnerability detection using source-level symbolic execution, and source-code recovery using the compiler's C backend. Our techniques enable complex high-level transformations not possible in existing binary systems, address a major challenge of input-derived memory addresses in symbolic execution and are the first to enable recovery of a fully functional source-code. We present techniques to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable (with a stack pointer) to an abstract stack (without a stack pointer). Our methods do not use symbolic, relocation, or debug information since these are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called SecondWrite that uses LLVM as IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, produced by two different compilers (gcc and Microsoft Visual Studio compiler), three languages (C, C++, and Fortran), two operating systems (Windows and Linux) and a real world program (Apache server).
{"title":"A compiler-level intermediate representation based binary analysis and rewriting system","authors":"K. Anand, M. Smithson, Khaled Elwazeer, A. Kotha, Jim Gruen, Nathan Giles, R. Barua","doi":"10.1145/2465351.2465380","DOIUrl":"https://doi.org/10.1145/2465351.2465380","url":null,"abstract":"This paper presents component techniques essential for converting executables to a high-level intermediate representation (IR) of an existing compiler. The compiler IR is then employed for three distinct applications: binary rewriting using the compiler's binary back-end, vulnerability detection using source-level symbolic execution, and source-code recovery using the compiler's C backend. Our techniques enable complex high-level transformations not possible in existing binary systems, address a major challenge of input-derived memory addresses in symbolic execution and are the first to enable recovery of a fully functional source-code.\u0000 We present techniques to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable (with a stack pointer) to an abstract stack (without a stack pointer). Our methods do not use symbolic, relocation, or debug information since these are usually absent in deployed executables.\u0000 We have integrated our techniques with a prototype x86 binary framework called SecondWrite that uses LLVM as IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, produced by two different compilers (gcc and Microsoft Visual Studio compiler), three languages (C, C++, and Fortran), two operating systems (Windows and Linux) and a real world program (Apache server).","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"13 1","pages":"295-308"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91516867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a Geo-distributed key-value datastore, named ChainReaction, that offers causal+ consistency, with high performance, fault-tolerance, and scalability. ChainReaction enforces causal+ consistency which is stronger than eventual consistency by leveraging on a new variant of chain replication. We have experimentally evaluated the benefits of our approach by running the Yahoo! Cloud Serving Benchmark. Experimental results show that ChainReaction has better performance in read intensive workloads while offering competitive performance for other workloads. Also we show that our solution requires less metadata when compared with previous work.
{"title":"ChainReaction: a causal+ consistent datastore based on chain replication","authors":"Sérgio Almeida, J. Leitao, L. Rodrigues","doi":"10.1145/2465351.2465361","DOIUrl":"https://doi.org/10.1145/2465351.2465361","url":null,"abstract":"This paper proposes a Geo-distributed key-value datastore, named ChainReaction, that offers causal+ consistency, with high performance, fault-tolerance, and scalability. ChainReaction enforces causal+ consistency which is stronger than eventual consistency by leveraging on a new variant of chain replication. We have experimentally evaluated the benefits of our approach by running the Yahoo! Cloud Serving Benchmark. Experimental results show that ChainReaction has better performance in read intensive workloads while offering competitive performance for other workloads. Also we show that our solution requires less metadata when compared with previous work.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"23 1","pages":"85-98"},"PeriodicalIF":0.0,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83293605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}