首页 > 最新文献

Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation最新文献

英文 中文
Making paths explicit in the Scout operating system 在Scout操作系统中显式设置路径
D. Mosberger, L. Peterson
This paper makes a case for paths as an explicit abstraction in operating system design. Paths provide a unifying infrastructure for several OS mechanisms that have been introduced in the last several years, including fbufs, integrated layer processing, packet classifiers, code specialization, and migrating threads. This paper articulates the potential advantages of a path-based OS structure, describes the specific path architecture implemented in the Scout OS, and demonstrates the advantages in a particular application domain---receiving, decoding, and displaying MPEG-compressed video.
本文将路径作为操作系统设计中的显式抽象。路径为过去几年引入的几种操作系统机制提供了统一的基础设施,包括fbufs、集成层处理、包分类器、代码专门化和迁移线程。本文阐述了基于路径的操作系统结构的潜在优势,描述了Scout操作系统中实现的特定路径架构,并演示了在特定应用领域(接收、解码和显示mpeg压缩视频)中的优势。
{"title":"Making paths explicit in the Scout operating system","authors":"D. Mosberger, L. Peterson","doi":"10.1145/238721.238771","DOIUrl":"https://doi.org/10.1145/238721.238771","url":null,"abstract":"This paper makes a case for paths as an explicit abstraction in operating system design. Paths provide a unifying infrastructure for several OS mechanisms that have been introduced in the last several years, including fbufs, integrated layer processing, packet classifiers, code specialization, and migrating threads. This paper articulates the potential advantages of a path-based OS structure, describes the specific path architecture implemented in the Scout OS, and demonstrates the advantages in a particular application domain---receiving, decoding, and displaying MPEG-compressed video.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"43 4 1","pages":"153-167"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91030377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 337
Studies of Windows NT performance using dynamic execution traces 使用动态执行跟踪研究Windows NT性能
Sharon E. Perl, R. L. Sites
We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running benchmark and commercial applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture’s PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude from our studies that processor bandwidth can be a first-order bottleneck to achieving good performance. This is particularly apparent when studying commercial benchmarks. Operating system code and data structures contribute disproportionately to the memory access load. We also found that operating system software lock contention was a factor preventing the database benchmark from scaling up on the small multiprocessor, and that the cache coherence protocol employed by the machine introduced more cache interference than necessary.
我们研究了Windows NT性能的两个方面:在运行基准测试和商业应用程序的单处理器系统中内存访问的处理器带宽需求,以及在小型多处理器上商业数据库的锁定行为。我们的研究基于系统的完整动态执行跟踪,其中包括操作系统和应用程序在几秒钟内执行的所有指令(足够的时间来进行大量计算)。这些痕迹是在Alpha pc上获得的,使用一种名为PatchWrx的新软件工具,该工具利用Alpha架构的pal代码层来实现高效、全面的系统跟踪。因为Alpha版本的Windows NT使用与其他版本基本相同的代码库,因此执行几乎相同的调用序列、基本块和数据结构访问,我们相信我们的结论也适用于非Alpha版本的系统。本文描述了我们对PatchWrx的性能研究和有趣的方面。我们从研究中得出结论,处理器带宽可能是实现良好性能的一级瓶颈。在研究商业基准时,这一点尤为明显。操作系统代码和数据结构对内存访问负载的贡献不成比例。我们还发现,操作系统软件锁争用是阻止数据库基准测试在小型多处理器上扩展的一个因素,并且机器使用的缓存一致性协议引入了不必要的缓存干扰。
{"title":"Studies of Windows NT performance using dynamic execution traces","authors":"Sharon E. Perl, R. L. Sites","doi":"10.1145/238721.238773","DOIUrl":"https://doi.org/10.1145/238721.238773","url":null,"abstract":"We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running benchmark and commercial applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture’s PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude from our studies that processor bandwidth can be a first-order bottleneck to achieving good performance. This is particularly apparent when studying commercial benchmarks. Operating system code and data structures contribute disproportionately to the memory access load. We also found that operating system software lock contention was a factor preventing the database benchmark from scaling up on the small multiprocessor, and that the cache coherence protocol employed by the machine introduced more cache interference than necessary.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"97 1","pages":"169-183"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86359931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems 共享虚拟内存系统中两种基于家庭的延迟释放一致性协议的性能评估
Yuanyuan Zhou, L. Iftode, Kai Li
This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the diÆculty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The rst, Home-based LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We nd that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to o oad some of the protocol overhead of HLRC from the critical path followed by the compute processor. We nd that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message traÆc, and memory use for each of the protocols.
本文研究了共享虚拟内存协议在大型多机环境下的性能。通过在64节点的Paragon上进行实验,我们发现传统的延迟发布一致性(Lazy Release Consistency, LRC)协议不能很好地扩展,因为它需要大量的消息,它为协议开销数据消耗大量的内存,并且因为diÆculty的垃圾收集数据。为了实现更高的可扩展性能,我们引入并评估了两个新协议。另一种是基于家庭的LRC (HLRC),它基于自动更新发布一致性(AURC)协议。与AURC一样,HLRC为每个页面维护一个主页,所有更新都传播到该主页,所有副本都从该主页派生。与AURC不同,HLRC不需要专门的硬件支持。我们发现,与LRC相比,家庭的使用在性能和可扩展性方面提供了实质性的改进。我们的第二个协议称为Overlapped Home-based LRC (OHLRC),它利用Paragon的每个节点上的通信处理器来从计算处理器所遵循的关键路径中加载HLRC的一些协议开销。我们发现OHLRC比hrrc提供了适度的改进。我们还将重叠应用于基本LRC协议,得到了类似的结果。我们的实验是使用5个Splash-2基准测试完成的。我们报告总体执行时间,以及每个协议的运行时间、消息traÆc和内存使用的详细细分。
{"title":"Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems","authors":"Yuanyuan Zhou, L. Iftode, Kai Li","doi":"10.1145/238721.238763","DOIUrl":"https://doi.org/10.1145/238721.238763","url":null,"abstract":"This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the diÆculty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The rst, Home-based LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We nd that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to o oad some of the protocol overhead of HLRC from the critical path followed by the compute processor. We nd that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message traÆc, and memory use for each of the protocols.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"35 1","pages":"75-88"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88907164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 236
A trace-driven comparison of algorithms for parallel prefetching and caching 并行预取和缓存算法的跟踪驱动比较
T. Kimbrel, A. Tomkins, R. H. Patterson, B. Bershad, P. Cao, E. Felten, Garth A. Gibson, Anna R. Karlin, Kai Li
High-performance I/O systems depend on prefetching and caching in order to deliver good performance to applications. These two techniques have generally been considered in isolation, even though there are signi cant interactions between them; a block prefetched too early reduces the e ectiveness of the cache, while a block cached too long reduces the effectiveness of prefetching. In this paper we study the effects of several combined prefetching and caching strategies for systems with multiple disks. Using disk-accurate tracedriven simulation, we explore the performance characteristics of each of the algorithms in cases in which applications provide full advance knowledge of accesses using hints. Some of the strategies have been published with theoretical performance bounds, and some are components of systems that have been built. One is a new algorithm that combines the desirable characteristics of the others. We nd that when performance is limited by I/O stalls, aggressive prefetching helps to alleviate the problem; that more conservative prefetching is appropriate when signi cant I/O stalls are not present; and that a single, simple strategy is capable of doing both.
高性能I/O系统依赖于预取和缓存,以便为应用程序提供良好的性能。这两种技术通常被认为是孤立的,即使它们之间存在显著的相互作用;预取的块太早会降低缓存的有效性,而预取的块太长会降低预取的有效性。本文研究了几种组合预取和缓存策略对多磁盘系统的影响。使用磁盘精确的跟踪驱动模拟,我们探索了在应用程序使用提示提供完整的访问预先知识的情况下,每种算法的性能特征。有些策略已经发表,并且有理论上的性能界限,有些策略是已经构建的系统的组成部分。一种是一种新的算法,它结合了其他算法的理想特性。我们发现,当性能受到I/O延迟的限制时,主动预取有助于缓解问题;当不存在明显的I/O延迟时,更保守的预取是合适的;一个简单的策略就能做到这两点。
{"title":"A trace-driven comparison of algorithms for parallel prefetching and caching","authors":"T. Kimbrel, A. Tomkins, R. H. Patterson, B. Bershad, P. Cao, E. Felten, Garth A. Gibson, Anna R. Karlin, Kai Li","doi":"10.1145/238721.238737","DOIUrl":"https://doi.org/10.1145/238721.238737","url":null,"abstract":"High-performance I/O systems depend on prefetching and caching in order to deliver good performance to applications. These two techniques have generally been considered in isolation, even though there are signi cant interactions between them; a block prefetched too early reduces the e ectiveness of the cache, while a block cached too long reduces the effectiveness of prefetching. In this paper we study the effects of several combined prefetching and caching strategies for systems with multiple disks. Using disk-accurate tracedriven simulation, we explore the performance characteristics of each of the algorithms in cases in which applications provide full advance knowledge of accesses using hints. Some of the strategies have been published with theoretical performance bounds, and some are components of systems that have been built. One is a new algorithm that combines the desirable characteristics of the others. We nd that when performance is limited by I/O stalls, aggressive prefetching helps to alleviate the problem; that more conservative prefetching is appropriate when signi cant I/O stalls are not present; and that a single, simple strategy is capable of doing both.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"11 1","pages":"19-34"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83509316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Lightweight logging for lazy release consistent distributed shared memory 用于延迟发布一致分布式共享内存的轻量级日志记录
Manuel Costa, P. Guedes, M. Sequeira, N. Neves, M. Castro
This paper presents a new logging and recovery algorithm for lazy release consistent distributed shared memory (DSM). The new algorithm tolerates single node failures by maintaining a distributed log of data dependencies in the volatile memory of processes. The algorithm adds very little overhead to the memory consistency protocol: it sends no additional messages during failure-free periods; it adds only a minimal amount of data to one of the DSM protocol messages; it introduces no forced rollbacks of non-faulty processes; and it performs no communication-induced accesses to stable storage. Furthermore, the algorithm logs only a very small amount of data, because it uses the log of memory accesses already maintained by the memory consistency protocol. The algorithm was implemented in TreadMarks, a state-of-the-art DSM system. Experimental results show that the algorithm has near zero time overhead and very low space overhead during failure-free execution, thus refuting the common belief that logging overhead is necessarily high in recoverable DSM systems.
提出了一种新的延迟释放一致性分布式共享内存(DSM)日志记录和恢复算法。新算法通过在进程的易失性内存中维护数据依赖的分布式日志来容忍单节点故障。该算法给内存一致性协议增加了很少的开销:它在无故障期间不会发送额外的消息;它只向DSM协议消息中添加少量数据;它不引入非故障进程的强制回滚;并且它不执行由通信引起的对稳定存储的访问。此外,该算法只记录非常少量的数据,因为它使用内存一致性协议已经维护的内存访问日志。该算法在最先进的DSM系统TreadMarks中实现。实验结果表明,该算法在无故障执行过程中具有接近于零的时间开销和极低的空间开销,从而反驳了可恢复DSM系统中日志开销必然很高的普遍观点。
{"title":"Lightweight logging for lazy release consistent distributed shared memory","authors":"Manuel Costa, P. Guedes, M. Sequeira, N. Neves, M. Castro","doi":"10.1145/238721.238762","DOIUrl":"https://doi.org/10.1145/238721.238762","url":null,"abstract":"This paper presents a new logging and recovery algorithm for lazy release consistent distributed shared memory (DSM). The new algorithm tolerates single node failures by maintaining a distributed log of data dependencies in the volatile memory of processes. The algorithm adds very little overhead to the memory consistency protocol: it sends no additional messages during failure-free periods; it adds only a minimal amount of data to one of the DSM protocol messages; it introduces no forced rollbacks of non-faulty processes; and it performs no communication-induced accesses to stable storage. Furthermore, the algorithm logs only a very small amount of data, because it uses the log of memory accesses already maintained by the memory consistency protocol. The algorithm was implemented in TreadMarks, a state-of-the-art DSM system. Experimental results show that the algorithm has near zero time overhead and very low space overhead during failure-free execution, thus refuting the common belief that logging overhead is necessarily high in recoverable DSM systems.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"65 1","pages":"59-73"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73789977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Safe kernel extensions without run-time checking 没有运行时检查的安全内核扩展
G. Necula, Peter Lee
Abstract : This paper describes a mechanism by which an operating system kernel can determine with certainty that it is safe to execute a binary supplied by an untrusted source. The kernel first defines a safety policy and makes it public. Then, using this policy, an application can provide binaries in a special form called proof-carrying code, or simply PCC. Each PCC binary contains, in addition to the native code, a formal proof that the code obeys the safety policy. The kernel can easily validate the proof without using cryptography and without consulting any external trusted entities. If the validation succeeds, the code is guaranteed to respect the safety policy without relying on run-time checks. The main practical difficulty of PCC is in generating the safety proofs. In order to gain some preliminary experience with this, we have written several network packet filters in hand-tuned DEC Alpha assembly language, and then generated PCC binaries for them using a special prototype assembler. The PCC binaries can be executed with no run-time over-head, beyond a one-time cost of 1 to 3 milliseconds for validating the enclosed proofs. The net result is that our packet filters are formally guaranteed to be safe and are faster than packet filters created using Berkeley Packet Filters, Software Fault Isolation, or safe languages such as Modula-3.
摘要:本文描述了一种机制,通过该机制,操作系统内核可以确定执行由不可信源提供的二进制文件是安全的。内核首先定义安全策略并将其公开。然后,使用此策略,应用程序可以以一种称为携带证明代码(proof-carrying code,简称PCC)的特殊形式提供二进制文件。除了本机代码外,每个PCC二进制文件还包含代码遵守安全策略的正式证明。内核可以轻松地验证证明,而无需使用加密技术,也无需咨询任何外部可信实体。如果验证成功,则保证代码遵守安全策略,而不依赖于运行时检查。PCC的主要实际难点是安全证明的生成。为了在这方面获得一些初步的经验,我们用手工调整的DEC Alpha汇编语言编写了几个网络包过滤器,然后使用一个特殊的原型汇编器为它们生成PCC二进制文件。执行PCC二进制文件时没有运行时开销,只需要一次性花费1到3毫秒来验证附带的证明。最终的结果是,我们的包过滤器被正式保证是安全的,并且比使用Berkeley包过滤器、软件故障隔离或安全语言(如Modula-3)创建的包过滤器更快。
{"title":"Safe kernel extensions without run-time checking","authors":"G. Necula, Peter Lee","doi":"10.1145/238721.238781","DOIUrl":"https://doi.org/10.1145/238721.238781","url":null,"abstract":"Abstract : This paper describes a mechanism by which an operating system kernel can determine with certainty that it is safe to execute a binary supplied by an untrusted source. The kernel first defines a safety policy and makes it public. Then, using this policy, an application can provide binaries in a special form called proof-carrying code, or simply PCC. Each PCC binary contains, in addition to the native code, a formal proof that the code obeys the safety policy. The kernel can easily validate the proof without using cryptography and without consulting any external trusted entities. If the validation succeeds, the code is guaranteed to respect the safety policy without relying on run-time checks. The main practical difficulty of PCC is in generating the safety proofs. In order to gain some preliminary experience with this, we have written several network packet filters in hand-tuned DEC Alpha assembly language, and then generated PCC binaries for them using a special prototype assembler. The PCC binaries can be executed with no run-time over-head, beyond a one-time cost of 1 to 3 milliseconds for validating the enclosed proofs. The net result is that our packet filters are formally guaranteed to be safe and are faster than packet filters created using Berkeley Packet Filters, Software Fault Isolation, or safe languages such as Modula-3.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"50 1","pages":"229-243"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84546377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 609
A hierarchial CPU scheduler for multimedia operating systems 多媒体操作系统的分层CPU调度程序
P. Goyal, Xingang Guo, H. Vin
The need for supporting variety of hard and soft real-time as well as best effort applications in a multimedia computing environment requires an operating system framework that: (1) enables different schedulers to be employed for different application classes, and (2) provides protection between the various classes of applications. We argue that these objectives can be achieved by hierarchical partitioning of CPU bandwidth, in which an operating system partitions the CPU bandwidth among various application classes, and each application class, in turn, partitions its allocation (potentially using a different scheduling algorithm) among its sub-classes or applications. We present Start-time Fair Queuing (SFQ) algorithm, which enables such hierarchical partitioning. We have implemented a hierarchical scheduler in Solaris 2.4. We describe our implementation, and demonstrate its suitability for multimedia operating systems.
在多媒体计算环境中,需要支持各种软硬实时和尽力而为的应用程序,因此需要一个操作系统框架:(1)允许为不同的应用程序类使用不同的调度器,以及(2)在不同的应用程序类之间提供保护。我们认为这些目标可以通过CPU带宽的分层分区来实现,其中操作系统在各种应用程序类之间划分CPU带宽,而每个应用程序类依次在其子类或应用程序之间划分其分配(可能使用不同的调度算法)。我们提出了开始时间公平排队(SFQ)算法,实现了这种分层分区。我们在Solaris 2.4中实现了一个分层调度器。我们描述了我们的实现,并演示了它对多媒体操作系统的适用性。
{"title":"A hierarchial CPU scheduler for multimedia operating systems","authors":"P. Goyal, Xingang Guo, H. Vin","doi":"10.1145/238721.238766","DOIUrl":"https://doi.org/10.1145/238721.238766","url":null,"abstract":"The need for supporting variety of hard and soft real-time as well as best effort applications in a multimedia computing environment requires an operating system framework that: (1) enables different schedulers to be employed for different application classes, and (2) provides protection between the various classes of applications. We argue that these objectives can be achieved by hierarchical partitioning of CPU bandwidth, in which an operating system partitions the CPU bandwidth among various application classes, and each application class, in turn, partitions its allocation (potentially using a different scheduling algorithm) among its sub-classes or applications. We present Start-time Fair Queuing (SFQ) algorithm, which enables such hierarchical partitioning. We have implemented a hierarchical scheduler in Solaris 2.4. We describe our implementation, and demonstrate its suitability for multimedia operating systems.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"1 1","pages":"107-121"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87463472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 419
Efficient cooperative caching using hints 使用提示的高效协同缓存
P. Sarkar, J. Hartman
We present a very low-overhead decentralized algorithm for cooperative caching that provides performance comparable to that of existing centralized algorithms. Unlike existing algorithms that rely on centralized control of cache functions, our algorithm uses hints (i.e. inexact information) to allow clients to perform these functions in a decentralized fashion. This paper shows that a hint-based system performs as well as a more tightlycoordinated system while requiring less overhead. Simulations show that the block access times of our system are as good as those of the existing tightly-coordinated algorithms, while reducing manager load by more than a factor of 15, block lookup traffic by nearly a factor of two-thirds, and replacement traffic by more than a factor of 5.
我们提出了一种非常低开销的分布式协同缓存算法,其性能可与现有的集中式算法相媲美。与现有的依赖于集中控制缓存功能的算法不同,我们的算法使用提示(即不精确的信息)来允许客户端以分散的方式执行这些功能。本文表明,基于提示的系统在需要更少开销的情况下,性能与更紧密协调的系统一样好。仿真结果表明,该系统的块访问时间与现有紧密协调算法相当,同时将管理器负载减少了15倍以上,块查找流量减少了近三分之二,替换流量减少了5倍以上。
{"title":"Efficient cooperative caching using hints","authors":"P. Sarkar, J. Hartman","doi":"10.1145/238721.238741","DOIUrl":"https://doi.org/10.1145/238721.238741","url":null,"abstract":"We present a very low-overhead decentralized algorithm for cooperative caching that provides performance comparable to that of existing centralized algorithms. Unlike existing algorithms that rely on centralized control of cache functions, our algorithm uses hints (i.e. inexact information) to allow clients to perform these functions in a decentralized fashion. This paper shows that a hint-based system performs as well as a more tightlycoordinated system while requiring less overhead. Simulations show that the block access times of our system are as good as those of the existing tightly-coordinated algorithms, while reducing manager load by more than a factor of 15, block lookup traffic by nearly a factor of two-thirds, and replacement traffic by more than a factor of 5.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"29 1","pages":"35-46"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81996007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
An implementation of the Hamlyn sender-managed interface architecture Hamlyn发送方管理接口架构的实现
G. Buzzard, D. Jacobson, M. Mackey, Scott B. Marovich, J. Wilkes
interconnects, clusters, sender-based, Hamlyn, Myrinet As the latency and bandwidth of multicomputer interconnection fabrics improve, there is a growing need for an interface between them and host processors that does not hide these gains behind software overhead. The Hamlyn interface architecture does this. It uses sender-based memory management to eliminate receiver buffer overruns, provides applications with direct hardware access to minimize latency, supports adaptive routing networks to allow higher throughput, and offers full protection between applications so that it can be used in a general-purpose computing environment. To test these claims we built a prototype Hamlyn interface for a Myrinet network connected to a standard HP workstation and report here on its design and performance. Our interface delivers an application-to-application round trip time of 28 ~s for short messages and a one way time of 17.4~s + 32.6nslbyte (30.7 MB/s) for longer ones, while requiring fewer CPU cycles than an aggressive implementation of Active Messages on the CM-5.
随着多计算机互连结构的延迟和带宽的提高,对它们和主机处理器之间的接口的需求越来越大,这种接口不能将这些增益隐藏在软件开销之后。Hamlyn接口架构做到了这一点。它使用基于发送方的内存管理来消除接收方缓冲区溢出,为应用程序提供直接的硬件访问以最小化延迟,支持自适应路由网络以允许更高的吞吐量,并在应用程序之间提供全面保护,从而可以在通用计算环境中使用。为了验证这些说法,我们为Myrinet网络建立了一个连接到标准HP工作站的Hamlyn接口原型,并在这里报告其设计和性能。对于短消息,我们的接口提供了应用程序到应用程序的往返时间为28 ~s,对于长消息,单向时间为17.4~s + 32.6nslbyte (30.7 MB/s),同时比CM-5上积极实现的活动消息所需的CPU周期更少。
{"title":"An implementation of the Hamlyn sender-managed interface architecture","authors":"G. Buzzard, D. Jacobson, M. Mackey, Scott B. Marovich, J. Wilkes","doi":"10.1145/238721.238784","DOIUrl":"https://doi.org/10.1145/238721.238784","url":null,"abstract":"interconnects, clusters, sender-based, Hamlyn, Myrinet As the latency and bandwidth of multicomputer interconnection fabrics improve, there is a growing need for an interface between them and host processors that does not hide these gains behind software overhead. The Hamlyn interface architecture does this. It uses sender-based memory management to eliminate receiver buffer overruns, provides applications with direct hardware access to minimize latency, supports adaptive routing networks to allow higher throughput, and offers full protection between applications so that it can be used in a general-purpose computing environment. To test these claims we built a prototype Hamlyn interface for a Myrinet network connected to a standard HP workstation and report here on its design and performance. Our interface delivers an application-to-application round trip time of 28 ~s for short messages and a one way time of 17.4~s + 32.6nslbyte (30.7 MB/s) for longer ones, while requiring fewer CPU cycles than an aggressive implementation of Active Messages on the CM-5.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"2 1","pages":"245-259"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72579536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Automatic compiler-inserted I/O prefetching for out-of-core applications 用于核外应用程序的自动编译器插入I/O预取
T. Mowry, Angela K. Demke, O. Krieger
Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully-automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme, the compiler provides the crucial information on future access patterns without burdening the programmer, the operating system supports non-binding prefetch and release hints for managing I/O, and the operating system cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively move prefetches back ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We have implemented our scheme using the SUIF compiler and the Hurricane operating system. Our experimental results demonstrate that our fully-automatic scheme effectively hides the I/O latency in out-ofcore versions of the entire NAS Parallel benchmark suite, thus resulting in speedups of roughly twofold for five of the eight applications, with one application speeding up by over threefold.
当数字应用程序的工作集不适合主存时,当前的操作系统会提供较差的性能。因此,希望有效解决“核外”问题的程序员通常面临着重写应用程序以使用显式I/O操作(例如,读/写)的繁重任务。在本文中,我们提出并评估了一种完全自动化的技术,它将程序员从这个任务中解放出来,提供高性能,并且只需要对当前的操作系统进行最小的更改。在我们的方案中,编译器在不增加程序员负担的情况下提供关于未来访问模式的关键信息,操作系统支持非绑定预取和释放提示来管理I/O,操作系统与运行时层合作,通过适应动态行为和最小化预取开销来加速性能。这种方法为程序员维护了无限虚拟内存的抽象,使编译器能够灵活地将预取移回引用之前,并使操作系统能够灵活地在多个应用程序的竞争资源需求之间进行仲裁。我们已经使用SUIF编译器和Hurricane操作系统实现了我们的方案。我们的实验结果表明,我们的全自动方案有效地隐藏了整个NAS并行基准套件的外核版本中的I/O延迟,从而使八个应用程序中的五个应用程序的速度提高了大约两倍,其中一个应用程序的速度提高了三倍以上。
{"title":"Automatic compiler-inserted I/O prefetching for out-of-core applications","authors":"T. Mowry, Angela K. Demke, O. Krieger","doi":"10.1145/238721.238734","DOIUrl":"https://doi.org/10.1145/238721.238734","url":null,"abstract":"Current operating systems offer poor performance when a numeric application’s working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully-automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme, the compiler provides the crucial information on future access patterns without burdening the programmer, the operating system supports non-binding prefetch and release hints for managing I/O, and the operating system cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively move prefetches back ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We have implemented our scheme using the SUIF compiler and the Hurricane operating system. Our experimental results demonstrate that our fully-automatic scheme effectively hides the I/O latency in out-ofcore versions of the entire NAS Parallel benchmark suite, thus resulting in speedups of roughly twofold for five of the eight applications, with one application speeding up by over threefold.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"38 1","pages":"3-17"},"PeriodicalIF":0.0,"publicationDate":"1996-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80951034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 225
期刊
Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1