Workshop on Memory System Performance and Correctness最新文献

英文中文

Workshop on Memory System Performance and Correctness

Pub Date : 2006-10-22 DOI: 10.1145/1178597.1178599

Mark Aiken, Manuel Fähndrich, C. Hawblitzel, G. Hunt, J. Larus

Most operating systems enforce process isolation through hardware protection mechanisms such as memory segmentation, page mapping, and differentiated user and kernel instructions. Singularity is a new operating system that uses software mechanisms to enforce process isolation. A software isolated process (SIP) is a process whose boundaries are established by language safety rules and enforced by static type checking. SIPs provide a low cost isolation mechanism that provides failure isolation and fast inter-process communication.To compare the performance of Singularity's SIPs against traditional isolation techniques, we implemented an optional hardware isolation mechanism. Protection domains are hardware-enforced address spaces, which can contain one or more SIPs. Domains can either run at the kernel's privilege level or be fully isolated from the kernel and run at the normal application privilege level. With protection domains, we can construct Singularity configurations that are similar to micro-kernel and monolithic kernel systems. We found that hardware-based isolation incurs non-trivial performance costs (up to 25--33%) and complicates system implementation. Software isolation has less than 5% overhead on these benchmarks.The lower run-time cost of SIPs makes their use feasible at a finer granularity than conventional processes. However, hardware isolation remains valuable as a defense-in-depth against potential failures in software isolation mechanisms. Singularity's ability to employ hardware isolation selectively enables careful balancing of the costs and benefits of each isolation technique.

大多数操作系统通过硬件保护机制(如内存分段、页面映射以及区分的用户和内核指令)来实施进程隔离。Singularity是一种新的操作系统，它使用软件机制来强制进程隔离。软件隔离进程(SIP)是一种进程，其边界由语言安全规则建立，并由静态类型检查强制执行。sip提供了一种低成本的隔离机制，提供故障隔离和快速的进程间通信。为了比较Singularity的sip与传统隔离技术的性能，我们实现了一个可选的硬件隔离机制。保护域是硬件强制的地址空间，可以包含一个或多个sip。域可以在内核的特权级别上运行，也可以与内核完全隔离，在正常的应用程序特权级别上运行。有了保护域，我们可以构造类似于微内核和单内核系统的奇点配置。我们发现，基于硬件的隔离会带来不小的性能成本(高达25- 33%)，并使系统实现变得复杂。在这些基准测试中，软件隔离的开销不到5%。sip较低的运行时成本使得它们可以在比传统流程更细的粒度上使用。然而，硬件隔离作为对软件隔离机制中潜在故障的深度防御仍然很有价值。Singularity有选择地使用硬件隔离的能力使得能够仔细平衡每种隔离技术的成本和收益。

{"title":"Deconstructing process isolation","authors":"Mark Aiken, Manuel Fähndrich, C. Hawblitzel, G. Hunt, J. Larus","doi":"10.1145/1178597.1178599","DOIUrl":"https://doi.org/10.1145/1178597.1178599","url":null,"abstract":"Most operating systems enforce process isolation through hardware protection mechanisms such as memory segmentation, page mapping, and differentiated user and kernel instructions. Singularity is a new operating system that uses software mechanisms to enforce process isolation. A software isolated process (SIP) is a process whose boundaries are established by language safety rules and enforced by static type checking. SIPs provide a low cost isolation mechanism that provides failure isolation and fast inter-process communication.To compare the performance of Singularity's SIPs against traditional isolation techniques, we implemented an optional hardware isolation mechanism. Protection domains are hardware-enforced address spaces, which can contain one or more SIPs. Domains can either run at the kernel's privilege level or be fully isolated from the kernel and run at the normal application privilege level. With protection domains, we can construct Singularity configurations that are similar to micro-kernel and monolithic kernel systems. We found that hardware-based isolation incurs non-trivial performance costs (up to 25--33%) and complicates system implementation. Software isolation has less than 5% overhead on these benchmarks.The lower run-time cost of SIPs makes their use feasible at a finer granularity than conventional processes. However, hardware isolation remains valuable as a defense-in-depth against potential failures in software isolation mechanisms. Singularity's ability to employ hardware isolation selectively enables careful balancing of the costs and benefits of each isolation technique.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121961217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures 共享内存系统上的有效模式挖掘:对芯片多处理器架构的影响

Workshop on Memory System Performance and Correctness

Pub Date : 2006-10-22 DOI: 10.1145/1178597.1178603

G. Buehrer, Yen-kuang Chen, S. Parthasarathy, A. Nguyen, A. Ghoting, Daehyun Kim

Frequent pattern mining is a fundamental data mining process which has practical applications ranging from market basket data analysis to web link analysis. In this work, we show that state-of-the-art frequent pattern mining applications are inefficient when executing on a shared memory multiprocessor system, due primarily to poor utilization of the memory hierarchy. To improve the efficiency of these applications, we explore memory performance improvements, task partitioning strategies, and task queuing models designed to maximize the scalability of pattern mining on SMP systems. Empirically, we show that the proposed strategies afford significantly improved performance. We also discuss implications of this work in light of recent trends in micro-architecture design, particularly chip multiprocessors (CMPs).

频繁模式挖掘是一种基本的数据挖掘过程，从市场购物篮数据分析到web链接分析都有实际应用。在这项工作中，我们展示了最先进的频繁模式挖掘应用程序在共享内存多处理器系统上执行时效率低下，这主要是由于内存层次结构利用率低下。为了提高这些应用程序的效率，我们探索了内存性能改进、任务分区策略和任务队列模型，这些模型旨在最大限度地提高SMP系统上模式挖掘的可扩展性。经验表明，我们提出的策略提供显著提高的性能。我们还根据微架构设计的最新趋势，特别是芯片多处理器(cmp)，讨论了这项工作的含义。

引用次数: 2

Smarter garbage collection with simplifiers 使用简化器进行更智能的垃圾收集

Workshop on Memory System Performance and Correctness

Pub Date : 2006-10-22 DOI: 10.1145/1178597.1178601

Melissa E. O'Neill

We introduce a method for providing lightweight daemons, called simplifiers, that attach themselves to program data. If a data item has a simplifier, the simplifier may be run automatically from time to time, seeking an opportunity to "simplify" the object in some way that improves the program's time or space performance.It is not uncommon for programs to improve their data structures as they traverse them, but these improvements must wait until such a traversal occurs. Simplifiers provide an alternative mechanism for making improvements that is not tied to the vagaries of normal control flow.Tracing garbage collectors can both support the simplifier abstraction and benefit from it. Because tracing collectors traverse program data structures, they can trigger simplifiers as part of the tracing process. (In fact, it is possible to view simplifiers as analogous to finalizers; whereas an object can have a finalizer that is run automatically when the object found to be dead, a simplifier can be run when the object is found to be live.)Simplifiers can aid efficient collection by simplifying objects before they are traced, thereby eliminating some data that would otherwise have been traced and saved by the collector. We present performance data to show that appropriately chosen simplifiers can lead to tangible space and speed benefits in practice.Different variations of simplifiers are possible, depending on the triggering mechanism and the synchronization policy. Some kinds of simplifier are already in use in mainstream systems in the form of ad-hoc garbage-collector extensions. For one kind of simplifier we include a complete and portable Java implementation that is less than thirty lines long.

我们介绍了一种提供轻量级守护进程的方法，称为简化器，它将自己附加到程序数据上。如果一个数据项有一个简化器，那么这个简化器可能会不时地自动运行，寻找机会以某种方式“简化”该对象，从而提高程序的时间或空间性能。程序在遍历数据结构时改进它们的数据结构并不罕见，但是这些改进必须等到这样的遍历发生时才进行。简化器为改进提供了一种替代机制，这种机制与常规控制流的变幻莫测无关。跟踪垃圾收集器既可以支持简化器抽象，也可以从中受益。由于跟踪收集器遍历程序数据结构，因此它们可以在跟踪过程中触发简化器。(事实上，可以把简化器看作类似于终结器;对象可以有终结器，当发现对象已死时自动运行，而简化器可以在发现对象还活着时运行。)简化器可以通过在跟踪对象之前简化对象来帮助有效的收集，从而消除了一些本来应该由收集器跟踪和保存的数据。我们提供的性能数据表明，适当选择的简化器可以在实践中带来切实的空间和速度优势。根据触发机制和同步策略，可能有不同的简化器变体。一些简化器已经以特别的垃圾收集器扩展的形式在主流系统中使用。对于一种简化器，我们包含了一个完整的、可移植的Java实现，其长度少于30行。

{"title":"Smarter garbage collection with simplifiers","authors":"Melissa E. O'Neill","doi":"10.1145/1178597.1178601","DOIUrl":"https://doi.org/10.1145/1178597.1178601","url":null,"abstract":"We introduce a method for providing lightweight daemons, called simplifiers, that attach themselves to program data. If a data item has a simplifier, the simplifier may be run automatically from time to time, seeking an opportunity to \"simplify\" the object in some way that improves the program's time or space performance.It is not uncommon for programs to improve their data structures as they traverse them, but these improvements must wait until such a traversal occurs. Simplifiers provide an alternative mechanism for making improvements that is not tied to the vagaries of normal control flow.Tracing garbage collectors can both support the simplifier abstraction and benefit from it. Because tracing collectors traverse program data structures, they can trigger simplifiers as part of the tracing process. (In fact, it is possible to view simplifiers as analogous to finalizers; whereas an object can have a finalizer that is run automatically when the object found to be dead, a simplifier can be run when the object is found to be live.)Simplifiers can aid efficient collection by simplifying objects before they are traced, thereby eliminating some data that would otherwise have been traced and saved by the collector. We present performance data to show that appropriately chosen simplifiers can lead to tangible space and speed benefits in practice.Different variations of simplifiers are possible, depending on the triggering mechanism and the synchronization policy. Some kinds of simplifier are already in use in mainstream systems in the form of ad-hoc garbage-collector extensions. For one kind of simplifier we include a complete and portable Java implementation that is less than thirty lines long.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms 一口气七个:可扩展矩阵算法的缓存无关范式的结果

Workshop on Memory System Performance and Correctness

Pub Date : 2006-10-22 DOI: 10.1145/1178597.1178604

Michael D. Adams, David S. Wise

A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by• time• TLB misses• L1 misses• L2 misses• paging to disk• scaling on distributed processors, and• portability to multiple platforms.It provides a philosophy and tools that allow the programmer to deal with the memory hierarchy invisibly, from L1 and L2 to TLB, paging, and interprocessor communication. Used together, they provide a cache-oblivious style of programming.Plots are presented to support these claims on an implementation of Cholesky factorization crafted directly from the paradigm in C with a few intrinsic calls. The results in this paper focus on low-level performance, including the new Morton-hybrid representation to take advantage of hardware and compiler optimizations. In particular, this code beats Intel's Matrix Kernel Library and matches AMD's Core Math Library, losing a bit on L1 misses while winning decisively on TLB-misses.

提出了块递归矩阵算法的一个蓬勃发展的范例，它立即获得了出色的性能，通过时间、TLB缺失、L1缺失、L2缺失、磁盘分页、分布式处理器上的扩展以及多平台的可移植性来衡量。它提供了一种哲学和工具，允许程序员以不可见的方式处理内存层次结构，从L1和L2到TLB、分页和处理器间通信。它们一起使用，提供了一种无关缓存的编程风格。本文给出了一些图表来支持这些主张，这些图表是通过C语言中直接从范式中使用几个内在调用制作的Cholesky分解的实现。本文的结果侧重于底层性能，包括利用硬件和编译器优化的新的Morton-hybrid表示。特别是，这段代码击败了英特尔的矩阵内核库和AMD的核心数学库，在L1失误时输了一点，而在tlb失误时决定性地赢了。

引用次数: 20

A flexible data to L2 cache mapping approach for future multicore processors 一种灵活的数据到L2缓存映射方法，用于未来的多核处理器

Workshop on Memory System Performance and Correctness

Pub Date : 2006-10-22 DOI: 10.1145/1178597.1178613

Lei Jin, Hyunjin Lee, Sangyeun Cho

This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cache management is a crucial multicore processor design aspect to overcome non-uniform cache access latency for high program performance and to reduce on-chip network traffic and related power consumption. Unlike previously studied "pure" hardware-based private and shared cache designs, the proposed OS-microarchitecture approach allows mimicking a wide spectrum of L2 caching policies without complex hardware support. Moreover, processors and cache slices can be isolated from each other without hardware modifications, resulting in improved chip reliability characteristics. We discuss the key design issues and implementation strategies of the proposed approach, and present an experimental result showing the promise of it.

本文提出并研究了一种在未来多核处理器芯片中通过页级数据缓存片映射的分布式L2缓存管理方法。二级缓存管理是多核处理器设计的一个关键方面，它可以克服非统一缓存访问延迟，从而实现高程序性能，并减少片上网络流量和相关功耗。与先前研究的“纯”基于硬件的私有和共享缓存设计不同，所提出的操作系统微体系结构方法允许在没有复杂硬件支持的情况下模拟广泛的L2缓存策略。此外，处理器和缓存片可以相互隔离，而无需修改硬件，从而提高了芯片的可靠性特性。我们讨论了该方法的关键设计问题和实现策略，并给出了一个实验结果，显示了它的前景。

引用次数: 28

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Memory System Performance and Correctness

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀