Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles最新文献_第5页

Proceedings of the 23rd ACM Symposium on Operating Systems Principles 2011, SOSP 2011, Cascais, Portugal, October 23-26, 2011 第23届ACM操作系统原理研讨会论文集，2011年10月23日至26日，葡萄牙卡斯凯伊斯，SOSP 2011

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2011-01-01 DOI: 10.1145/2043556

引用次数: 0

ODR: output-deterministic replay for multicore debugging ODR:用于多核调试的输出确定性重放

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629594

Gautam Altekar, I. Stoica

Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races -- a source of nondeterminism that must be captured if executions are to be faithfully reproduced. In this paper, we present ODR--a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead.

复制bug是很困难的。确定性重放系统通过提供原始程序运行的高保真副本来解决这个问题，该副本可以重复执行以消除错误。不幸的是，现有的多处理器程序重放系统存在不足。这些系统要么产生很高的开销，依赖于非标准的多处理器硬件，要么无法可靠地再现执行。它们的主要障碍是数据竞争——如果要忠实地再现执行情况，就必须抓住这个不确定性的来源。在本文中，我们介绍了ODR——一个仅软件的重播系统，它可以再现错误并提供低开销的多处理器记录。ODR背后的关键观察是，出于调试目的，重播系统不需要生成原始执行的高保真副本。相反，它足以产生与原始执行显示相同输出的任何执行。根据这一观察结果，ODR放宽了保真度保证，以避免完全再现数据竞争的问题。其结果是，系统可以重播真正的多处理器应用程序(如Apache、MySQL和Java Virtual Machine)，并提供较低的记录模式开销。

{"title":"ODR: output-deterministic replay for multicore debugging","authors":"Gautam Altekar, I. Stoica","doi":"10.1145/1629575.1629594","DOIUrl":"https://doi.org/10.1145/1629575.1629594","url":null,"abstract":"Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zero-in on bugs. Unfortunately, existing replay systems for multiprocessor programs fall short. These systems either incur high overheads, rely on non-standard multiprocessor hardware, or fail to reliably reproduce executions. Their primary stumbling block is data races -- a source of nondeterminism that must be captured if executions are to be faithfully reproduced.\u0000 In this paper, we present ODR--a software-only replay system that reproduces bugs and provides low-overhead multiprocessor recording. The key observation behind ODR is that, for debugging purposes, a replay system does not need to generate a high-fidelity replica of the original execution. Instead, it suffices to produce any execution that exhibits the same outputs as the original. Guided by this observation, ODR relaxes its fidelity guarantees to avoid the problem of reproducing data-races altogether. The result is a system that replays real multiprocessor applications, such as Apache, MySQL, and the Java Virtual Machine, and provides low record-mode overhead.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"86 1","pages":"193-206"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83762075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 306

Fast byte-granularity software fault isolation 快速的字节粒度软件故障隔离

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629581

M. Castro, Manuel Costa, Jean-Phillipe Martin, Marcus Peinado, P. Akritidis, Austin Donnelly, P. Barham, Richard Black

Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot isolate existing kernel extensions with low overhead on standard hardware. This is a hard problem because these extensions communicate with the kernel using a complex interface and they communicate frequently. We present BGI (Byte-Granularity Isolation), a new software fault isolation technique that addresses this problem. BGI uses efficient byte-granularity memory protection to isolate kernel extensions in separate protection domains that share the same address space. BGI ensures type safety for kernel objects and it can detect common types of errors inside domains. Our results show that BGI is practical: it can isolate Windows drivers without requiring changes to the source code and it introduces a CPU overhead between 0 and 16%. BGI can also find bugs during driver testing. We found 28 new bugs in widely used Windows drivers.

尽管提出了将扩展隔离在单独的保护域中以包含错误的技术，但内核扩展中的错误仍然是导致操作系统可靠性差的主要原因之一。我们认为以前的故障隔离技术没有得到广泛应用，因为它们不能隔离在标准硬件上具有低开销的现有内核扩展。这是一个困难的问题，因为这些扩展使用复杂的接口与内核通信，并且它们经常通信。我们提出了BGI(字节粒度隔离)，一种新的软件故障隔离技术来解决这个问题。华大基因使用高效的字节粒度内存保护将内核扩展隔离在共享相同地址空间的单独保护域中。BGI确保内核对象的类型安全，它可以检测域内常见类型的错误。我们的结果表明BGI是实用的:它可以隔离Windows驱动程序而不需要更改源代码，并且它引入了0到16%之间的CPU开销。华大基因还可以在驱动程序测试期间发现bug。我们在广泛使用的Windows驱动程序中发现了28个新bug。

{"title":"Fast byte-granularity software fault isolation","authors":"M. Castro, Manuel Costa, Jean-Phillipe Martin, Marcus Peinado, P. Akritidis, Austin Donnelly, P. Barham, Richard Black","doi":"10.1145/1629575.1629581","DOIUrl":"https://doi.org/10.1145/1629575.1629581","url":null,"abstract":"Bugs in kernel extensions remain one of the main causes of poor operating system reliability despite proposed techniques that isolate extensions in separate protection domains to contain faults. We believe that previous fault isolation techniques are not widely used because they cannot isolate existing kernel extensions with low overhead on standard hardware. This is a hard problem because these extensions communicate with the kernel using a complex interface and they communicate frequently. We present BGI (Byte-Granularity Isolation), a new software fault isolation technique that addresses this problem. BGI uses efficient byte-granularity memory protection to isolate kernel extensions in separate protection domains that share the same address space. BGI ensures type safety for kernel objects and it can detect common types of errors inside domains. Our results show that BGI is practical: it can isolate Windows drivers without requiring changes to the source code and it introduces a CPU overhead between 0 and 16%. BGI can also find bugs during driver testing. We found 28 new bugs in widely used Windows drivers.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"21 1","pages":"45-58"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89474537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 203

Detecting large-scale system problems by mining console logs 通过挖掘控制台日志来检测大规模系统问题

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629587

W. Xu, Ling Huang, A. Fox, D. Patterson, Michael I. Jordan

Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We first parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our methodology works on textual console logs of any size and requires no changes to the service software, no human input, and no knowledge of the software's internals.

令人惊讶的是，控制台日志很少帮助操作员检测大型数据中心服务中的问题，因为它们通常由独立开发人员编写的许多软件组件的大量混合消息组成。我们提出了一种通用的方法来挖掘这些丰富的信息源，以自动检测系统运行时问题。我们首先通过将源代码分析与信息检索相结合来解析控制台日志，从而创建复合特性。然后，我们使用机器学习来分析这些特征以检测操作问题。我们表明，由于我们的方法具有创建复杂特征的优越能力，因此可以使用以前的方法进行不可能的分析。我们还将展示如何将分析结果提取为对操作人员友好的一页决策树，其中显示与检测到的问题相关的关键消息。我们使用Darkstar在线游戏服务器和Hadoop文件系统验证了我们的方法，在那里我们以高精度和很少的误报检测了许多实际问题。在Hadoop的情况下，我们能够在3分钟内分析2400万行控制台日志。我们的方法适用于任何大小的文本控制台日志，不需要更改服务软件，不需要人工输入，也不需要了解软件的内部结构。

{"title":"Detecting large-scale system problems by mining console logs","authors":"W. Xu, Ling Huang, A. Fox, D. Patterson, Michael I. Jordan","doi":"10.1145/1629575.1629587","DOIUrl":"https://doi.org/10.1145/1629575.1629587","url":null,"abstract":"Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We first parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our methodology works on textual console logs of any size and requires no changes to the service software, no human input, and no knowledge of the software's internals.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"11 1","pages":"117-132"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88669254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 992

Better I/O through byte-addressable, persistent memory 通过字节可寻址的持久内存实现更好的I/O

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629589

Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin C. Lee, D. Burger, Derrick Coetzee

Modern computer systems have been built around the assumption that persistent storage is accessed via a slow, block-based interface. However, new byte-addressable, persistent memory technologies such as phase change memory (PCM) offer fast, fine-grained access to persistent storage. In this paper, we present a file system and a hardware architecture that are designed around the properties of persistent, byteaddressable memory. Our file system, BPFS, uses a new technique called short-circuit shadow paging to provide atomic, fine-grained updates to persistent storage. As a result, BPFS provides strong reliability guarantees and offers better performance than traditional file systems, even when both are run on top of byte-addressable, persistent memory. Our hardware architecture enforces atomicity and ordering guarantees required by BPFS while still providing the performance benefits of the L1 and L2 caches. Since these memory technologies are not yet widely available, we evaluate BPFS on DRAM against NTFS on both a RAM disk and a traditional disk. Then, we use microarchitectural simulations to estimate the performance of BPFS on PCM. Despite providing strong safety and consistency guarantees, BPFS on DRAM is typically twice as fast as NTFS on a RAM disk and 4-10 times faster than NTFS on disk. We also show that BPFS on PCM should be significantly faster than a traditional disk-based file system.

现代计算机系统是围绕这样的假设构建的:通过缓慢的、基于块的接口访问持久存储。然而，新的字节可寻址的持久内存技术，如相变内存(PCM)，提供了对持久存储的快速、细粒度访问。在本文中，我们提出了一个文件系统和硬件体系结构，它们是围绕持久的、字节可寻址内存的属性设计的。我们的文件系统BPFS使用一种称为短路影子分页的新技术，为持久存储提供原子的、细粒度的更新。因此，BPFS提供了强大的可靠性保证，并提供了比传统文件系统更好的性能，即使两者都运行在可字节寻址的持久内存上也是如此。我们的硬件架构强制执行BPFS所需的原子性和排序保证，同时仍然提供L1和L2缓存的性能优势。由于这些内存技术还没有广泛应用，我们对DRAM上的BPFS与RAM磁盘和传统磁盘上的NTFS进行了评估。然后，我们使用微架构模拟来评估BPFS在PCM上的性能。尽管提供了强大的安全性和一致性保证，但DRAM上的BPFS通常比RAM磁盘上的NTFS快两倍，比磁盘上的NTFS快4-10倍。我们还表明，PCM上的BPFS应该比传统的基于磁盘的文件系统快得多。

{"title":"Better I/O through byte-addressable, persistent memory","authors":"Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin C. Lee, D. Burger, Derrick Coetzee","doi":"10.1145/1629575.1629589","DOIUrl":"https://doi.org/10.1145/1629575.1629589","url":null,"abstract":"Modern computer systems have been built around the assumption that persistent storage is accessed via a slow, block-based interface. However, new byte-addressable, persistent memory technologies such as phase change memory (PCM) offer fast, fine-grained access to persistent storage.\u0000 In this paper, we present a file system and a hardware architecture that are designed around the properties of persistent, byteaddressable memory. Our file system, BPFS, uses a new technique called short-circuit shadow paging to provide atomic, fine-grained updates to persistent storage. As a result, BPFS provides strong reliability guarantees and offers better performance than traditional file systems, even when both are run on top of byte-addressable, persistent memory. Our hardware architecture enforces atomicity and ordering guarantees required by BPFS while still providing the performance benefits of the L1 and L2 caches.\u0000 Since these memory technologies are not yet widely available, we evaluate BPFS on DRAM against NTFS on both a RAM disk and a traditional disk. Then, we use microarchitectural simulations to estimate the performance of BPFS on PCM. Despite providing strong safety and consistency guarantees, BPFS on DRAM is typically twice as fast as NTFS on a RAM disk and 4-10 times faster than NTFS on disk. We also show that BPFS on PCM should be significantly faster than a traditional disk-based file system.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"192 1","pages":"133-146"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77630611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 887

Modular data storage with Anvil 模块化数据存储与Anvil

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629590

Mike Mammarella, Shant Hovsepian, E. Kohler

Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data -- for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment with feature combinations or extensions. We present Anvil, a modular and extensible toolkit for building database back ends. Anvil's storage modules, called dTables, have much finer granularity than prior work. For example, some dTables specialize in writing data, while others provide optimized read-only formats. This specialization makes both kinds of dTable simple to write and understand. Unifying dTables implement more comprehensive functionality by layering over other dTables -- for instance, building a read/write store from read-only tables and a writable journal, or building a general-purpose store from optimized special-purpose stores. The dTable design leads to a flexible system powerful enough to implement many database storage layouts. Our prototype implementation of Anvil performs up to 5.5 times faster than an existing B-tree-based database back end on conventional workloads, and can easily be customized for further gains on specific data and workloads.

通过更改存储数据的布局，数据库已经实现了数量级的性能改进——例如，在列中安排数据或在存储之前压缩数据。这些改进是在单一的新引擎中实现的，然而，这使得实验功能组合或扩展变得困难。我们提出了Anvil，一个模块化和可扩展的工具包，用于构建数据库后端。Anvil的存储模块，称为dTables，比以前的工作具有更细的粒度。例如，一些dtable专门用于写数据，而另一些则提供优化的只读格式。这种专门化使得这两种dTable都易于编写和理解。统一的dtable通过对其他dtable进行分层来实现更全面的功能——例如，从只读表和可写日志构建一个读写存储，或者从优化的特殊用途存储构建一个通用存储。dTable的设计导致了一个灵活的系统，足够强大，可以实现许多数据库存储布局。我们的Anvil原型实现在传统工作负载上的执行速度比现有的基于b树的数据库后端快5.5倍，并且可以很容易地定制以获得特定数据和工作负载的进一步收益。

{"title":"Modular data storage with Anvil","authors":"Mike Mammarella, Shant Hovsepian, E. Kohler","doi":"10.1145/1629575.1629590","DOIUrl":"https://doi.org/10.1145/1629575.1629590","url":null,"abstract":"Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data -- for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment with feature combinations or extensions. We present Anvil, a modular and extensible toolkit for building database back ends. Anvil's storage modules, called dTables, have much finer granularity than prior work. For example, some dTables specialize in writing data, while others provide optimized read-only formats. This specialization makes both kinds of dTable simple to write and understand. Unifying dTables implement more comprehensive functionality by layering over other dTables -- for instance, building a read/write store from read-only tables and a writable journal, or building a general-purpose store from optimized special-purpose stores. The dTable design leads to a flexible system powerful enough to implement many database storage layouts. Our prototype implementation of Anvil performs up to 5.5 times faster than an existing B-tree-based database back end on conventional workloads, and can easily be customized for further gains on specific data and workloads.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"16 1","pages":"147-160"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89434027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Improving application security with data flow assertions 使用数据流断言改进应用程序安全性

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629604

A. Yip, Xi Wang, N. Zeldovich, M. Kaashoek

Resin is a new language runtime that helps prevent security vulnerabilities, by allowing programmers to specify application-level data flow assertions. Resin provides policy objects, which programmers use to specify assertion code and metadata; data tracking, which allows programmers to associate assertions with application data, and to keep track of assertions as the data flow through the application; and filter objects, which programmers use to define data flow boundaries at which assertions are checked. Resin's runtime checks data flow assertions by propagating policy objects along with data, as that data moves through the application, and then invoking filter objects when data crosses a data flow boundary, such as when writing data to the network or a file. Using Resin, Web application programmers can prevent a range of problems, from SQL injection and cross-site scripting, to inadvertent password disclosure and missing access control checks. Adding a Resin assertion to an application requires few changes to the existing application code, and an assertion can reuse existing code and data structures. For instance, 23 lines of code detect and prevent three previously-unknown missing access control vulnerabilities in phpBB, a popular Web forum application. Other assertions comprising tens of lines of code prevent a range of vulnerabilities in Python and PHP applications. A prototype of Resin incurs a 33% CPU overhead running the HotCRP conference management application.

Resin是一种新的语言运行时，它允许程序员指定应用程序级别的数据流断言，从而帮助防止安全漏洞。Resin提供了策略对象，程序员用它来指定断言代码和元数据;数据跟踪，它允许程序员将断言与应用程序数据相关联，并在数据流经应用程序时跟踪断言;以及筛选器对象，程序员用它来定义数据流边界，在那里检查断言。当数据在应用程序中移动时，Resin的运行时将策略对象与数据一起传播，然后在数据跨越数据流边界时(例如将数据写入网络或文件时)调用过滤器对象，从而检查数据流断言。使用Resin, Web应用程序程序员可以防止一系列问题，从SQL注入和跨站点脚本，到无意中泄露密码和缺少访问控制检查。向应用程序添加Resin断言需要对现有应用程序代码进行很少的更改，并且断言可以重用现有代码和数据结构。例如，在phpBB(一个流行的Web论坛应用程序)中，23行代码检测并防止了三个以前未知的访问控制缺失漏洞。其他包含数十行代码的断言可以防止Python和PHP应用程序中的一系列漏洞。Resin的原型在运行HotCRP会议管理应用程序时会产生33%的CPU开销。

{"title":"Improving application security with data flow assertions","authors":"A. Yip, Xi Wang, N. Zeldovich, M. Kaashoek","doi":"10.1145/1629575.1629604","DOIUrl":"https://doi.org/10.1145/1629575.1629604","url":null,"abstract":"Resin is a new language runtime that helps prevent security vulnerabilities, by allowing programmers to specify application-level data flow assertions. Resin provides policy objects, which programmers use to specify assertion code and metadata; data tracking, which allows programmers to associate assertions with application data, and to keep track of assertions as the data flow through the application; and filter objects, which programmers use to define data flow boundaries at which assertions are checked. Resin's runtime checks data flow assertions by propagating policy objects along with data, as that data moves through the application, and then invoking filter objects when data crosses a data flow boundary, such as when writing data to the network or a file.\u0000 Using Resin, Web application programmers can prevent a range of problems, from SQL injection and cross-site scripting, to inadvertent password disclosure and missing access control checks. Adding a Resin assertion to an application requires few changes to the existing application code, and an assertion can reuse existing code and data structures. For instance, 23 lines of code detect and prevent three previously-unknown missing access control vulnerabilities in phpBB, a popular Web forum application. Other assertions comprising tens of lines of code prevent a range of vulnerabilities in Python and PHP applications. A prototype of Resin incurs a 33% CPU overhead running the HotCRP conference management application.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"339 1","pages":"291-304"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80730510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 213

Automatically patching errors in deployed software 自动修复已部署软件的错误

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629585

J. Perkins, Sunghun Kim, S. Larsen, Saman P. Amarasinghe, J. Bachrach, Michael Carbin, Carlos Pacheco, F. Sherwood, Stelios Sidiroglou, Greg Sullivan, W. Wong, Yoav Zibin, Michael D. Ernst, M. Rinard

We present ClearView, a system for automatically patching errors in deployed software. ClearView works on stripped Windows x86 binaries without any need for source code, debugging information, or other external information, and without human intervention. ClearView (1) observes normal executions to learn invariants thatcharacterize the application's normal behavior, (2) uses error detectors to distinguish normal executions from erroneous executions, (3) identifies violations of learned invariants that occur during erroneous executions, (4) generates candidate repair patches that enforce selected invariants by changing the state or flow of control to make the invariant true, and (5) observes the continued execution of patched applications to select the most successful patch. ClearView is designed to correct errors in software with high availability requirements. Aspects of ClearView that make it particularly appropriate for this context include its ability to generate patches without human intervention, apply and remove patchesto and from running applications without requiring restarts or otherwise perturbing the execution, and identify and discard ineffective or damaging patches by evaluating the continued behavior of patched applications. ClearView was evaluated in a Red Team exercise designed to test its ability to successfully survive attacks that exploit security vulnerabilities. A hostile external Red Team developed ten code injection exploits and used these exploits to repeatedly attack an application protected by ClearView. ClearView detected and blocked all of the attacks. For seven of the ten exploits, ClearView automatically generated patches that corrected the error, enabling the application to survive the attacks and continue on to successfully process subsequent inputs. Finally, the Red Team attempted to make Clear-View apply an undesirable patch, but ClearView's patch evaluation mechanism enabled ClearView to identify and discard both ineffective patches and damaging patches.

我们介绍了ClearView，一个在已部署软件中自动修补错误的系统。ClearView在剥离的Windows x86二进制文件上工作，不需要任何源代码、调试信息或其他外部信息，也不需要人工干预。ClearView(1)观察正常执行来学习描述应用程序正常行为的不变量，(2)使用错误检测器来区分正常执行和错误执行，(3)识别错误执行期间发生的对学习不变量的违反，(4)生成候选修复补丁，通过改变状态或控制流来强制执行所选择的不变量，以使不变量为真。(5)观察打了补丁的应用程序的持续执行情况，选择最成功的补丁。ClearView的设计目的是纠正具有高可用性需求的软件中的错误。ClearView特别适合这种环境的方面包括:无需人工干预就能生成补丁，无需重新启动或干扰执行就能在运行的应用程序中应用和删除补丁，以及通过评估打过补丁的应用程序的持续行为来识别和丢弃无效或有害的补丁。ClearView在红队演习中进行了评估，旨在测试其成功抵御利用安全漏洞的攻击的能力。恶意的外部红队开发了10个代码注入漏洞，并利用这些漏洞反复攻击ClearView保护的应用程序。ClearView检测并阻止了所有的攻击。对于十个漏洞中的七个，ClearView自动生成补丁来纠正错误，使应用程序能够在攻击中幸存下来，并继续成功地处理后续输入。最后，红队试图让Clear-View应用一个不需要的补丁，但是ClearView的补丁评估机制使ClearView能够识别并丢弃无效的补丁和有害的补丁。

{"title":"Automatically patching errors in deployed software","authors":"J. Perkins, Sunghun Kim, S. Larsen, Saman P. Amarasinghe, J. Bachrach, Michael Carbin, Carlos Pacheco, F. Sherwood, Stelios Sidiroglou, Greg Sullivan, W. Wong, Yoav Zibin, Michael D. Ernst, M. Rinard","doi":"10.1145/1629575.1629585","DOIUrl":"https://doi.org/10.1145/1629575.1629585","url":null,"abstract":"We present ClearView, a system for automatically patching errors in deployed software. ClearView works on stripped Windows x86 binaries without any need for source code, debugging information, or other external information, and without human intervention.\u0000 ClearView (1) observes normal executions to learn invariants thatcharacterize the application's normal behavior, (2) uses error detectors to distinguish normal executions from erroneous executions, (3) identifies violations of learned invariants that occur during erroneous executions, (4) generates candidate repair patches that enforce selected invariants by changing the state or flow of control to make the invariant true, and (5) observes the continued execution of patched applications to select the most successful patch.\u0000 ClearView is designed to correct errors in software with high availability requirements. Aspects of ClearView that make it particularly appropriate for this context include its ability to generate patches without human intervention, apply and remove patchesto and from running applications without requiring restarts or otherwise perturbing the execution, and identify and discard ineffective or damaging patches by evaluating the continued behavior of patched applications.\u0000 ClearView was evaluated in a Red Team exercise designed to test its ability to successfully survive attacks that exploit security vulnerabilities. A hostile external Red Team developed ten code injection exploits and used these exploits to repeatedly attack an application protected by ClearView. ClearView detected and blocked all of the attacks. For seven of the ten exploits, ClearView automatically generated patches that corrected the error, enabling the application to survive the attacks and continue on to successfully process subsequent inputs. Finally, the Red Team attempted to make Clear-View apply an undesirable patch, but ClearView's patch evaluation mechanism enabled ClearView to identify and discard both ineffective patches and damaging patches.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"15 1","pages":"87-102"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 427

seL4: formal verification of an OS kernel seL4:操作系统内核的正式验证

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629596

G. Klein, Kevin Elphinstone, G. Heiser, June Andronick, David A. Cock, Philip Derrin, D. Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, Simon Winwood

Complete formal verification is the only known way to guarantee that a system is free of programming errors. We present our experience in performing the formal, machine-checked verification of the seL4 microkernel from an abstract specification down to its C implementation. We assume correctness of compiler, assembly code, and hardware, and we used a unique design approach that fuses formal and operating systems techniques. To our knowledge, this is the first formal proof of functional correctness of a complete, general-purpose operating-system kernel. Functional correctness means here that the implementation always strictly follows our high-level abstract specification of kernel behaviour. This encompasses traditional design and implementation safety properties such as the kernel will never crash, and it will never perform an unsafe operation. It also proves much more: we can predict precisely how the kernel will behave in every possible situation. seL4, a third-generation microkernel of L4 provenance, comprises 8,700 lines of C code and 600 lines of assembler. Its performance is comparable to other high-performance L4 kernels.

完整的形式验证是保证系统没有编程错误的唯一已知方法。我们介绍了从抽象规范到其C实现对seL4微内核执行正式的、机器检查的验证的经验。我们假设编译器、汇编代码和硬件都是正确的，并且我们使用了一种独特的设计方法，融合了形式和操作系统技术。据我们所知，这是对一个完整的通用操作系统内核功能正确性的第一个正式证明。函数正确性意味着实现总是严格遵循我们对内核行为的高级抽象规范。这包括传统的设计和实现安全属性，比如内核永远不会崩溃，永远不会执行不安全的操作。它还证明了更多:我们可以精确地预测内核在每种可能情况下的行为。seL4是源自L4的第三代微内核，由8700行C代码和600行汇编程序组成。它的性能与其他高性能L4内核相当。

{"title":"seL4: formal verification of an OS kernel","authors":"G. Klein, Kevin Elphinstone, G. Heiser, June Andronick, David A. Cock, Philip Derrin, D. Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, Simon Winwood","doi":"10.1145/1629575.1629596","DOIUrl":"https://doi.org/10.1145/1629575.1629596","url":null,"abstract":"Complete formal verification is the only known way to guarantee that a system is free of programming errors.\u0000 We present our experience in performing the formal, machine-checked verification of the seL4 microkernel from an abstract specification down to its C implementation. We assume correctness of compiler, assembly code, and hardware, and we used a unique design approach that fuses formal and operating systems techniques. To our knowledge, this is the first formal proof of functional correctness of a complete, general-purpose operating-system kernel. Functional correctness means here that the implementation always strictly follows our high-level abstract specification of kernel behaviour. This encompasses traditional design and implementation safety properties such as the kernel will never crash, and it will never perform an unsafe operation. It also proves much more: we can predict precisely how the kernel will behave in every possible situation.\u0000 seL4, a third-generation microkernel of L4 provenance, comprises 8,700 lines of C code and 600 lines of assembler. Its performance is comparable to other high-performance L4 kernels.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"3 1","pages":"207-220"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82439317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1721

Debugging in the (very) large: ten years of implementation and experience 调试中的(非常)大:十年的实现和经验

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629586

Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, G. Hunt

Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billions of error reports in ten years of operation. It collects error data automatically and classifies errors into buckets, which are used to prioritize developer effort and report fixes to users. WER uses a progressive approach to data collection, which minimizes overhead for most reports yet allows developers to collect detailed information when needed. WER takes advantage of its scale to use error statistics as a tool in debugging; this allows developers to isolate bugs that could not be found at smaller scale. WER has been designed for large scale: one pair of database servers can record all the errors that occur on all Windows computers worldwide.

Windows错误报告(WER)是一个分布式系统，可以自动处理来自已安装的十亿台机器的错误报告。在十年的运行中，WER收集了数十亿份错误报告。它自动收集错误数据，并将错误分类到不同的桶中，这些桶用于确定开发人员工作的优先级，并向用户报告修复。WER使用一种渐进式的方法来收集数据，这使大多数报告的开销最小化，但允许开发人员在需要时收集详细信息。WER利用其规模优势，使用错误统计作为调试工具;这使得开发人员可以隔离在较小范围内无法发现的bug。WER是为大规模设计的:一对数据库服务器可以记录全球所有Windows计算机上发生的所有错误。

引用次数: 201