首页 > 最新文献

Workshop on Memory System Performance and Correctness最新文献

英文 中文
There is nothing wrong with out-of-thin-air: compiler optimization and memory models 虚幻的编译器优化和内存模型并没有错
Pub Date : 2011-06-05 DOI: 10.1145/1988915.1988917
Clark Verbrugge, Allan Kielstra, Yi Zhang
Memory models are used in concurrent systems to specify visibility properties of shared data. A practical memory model, however, must permit code optimization as well as provide a useful semantics for programmers. Here we extend recent observations that the current Java memory model imposes significant restrictions on the ability to optimize code. Beyond the known and potentially correctable proof concerns illustrated by others we show that major constraints on code generation and optimization can in fact be derived from fundamental properties and guarantees provided by the memory model. To address this and accommodate a better balance between programmability and optimization we present ideas for a simple concurrency semantics for Java that avoids basic problems at a cost of backward compatibility.
内存模型在并发系统中用于指定共享数据的可见性属性。然而,一个实用的内存模型必须允许代码优化,并为程序员提供有用的语义。在这里,我们扩展了最近的观察,即当前Java内存模型对优化代码的能力施加了重大限制。除了其他人所阐述的已知的和潜在的可纠正的证明问题之外,我们表明代码生成和优化的主要约束实际上可以从内存模型提供的基本属性和保证中派生出来。为了解决这个问题,并在可编程性和优化之间取得更好的平衡,我们提出了一种简单的Java并发语义,以向后兼容性为代价来避免基本问题。
{"title":"There is nothing wrong with out-of-thin-air: compiler optimization and memory models","authors":"Clark Verbrugge, Allan Kielstra, Yi Zhang","doi":"10.1145/1988915.1988917","DOIUrl":"https://doi.org/10.1145/1988915.1988917","url":null,"abstract":"Memory models are used in concurrent systems to specify visibility properties of shared data. A practical memory model, however, must permit code optimization as well as provide a useful semantics for programmers. Here we extend recent observations that the current Java memory model imposes significant restrictions on the ability to optimize code. Beyond the known and potentially correctable proof concerns illustrated by others we show that major constraints on code generation and optimization can in fact be derived from fundamental properties and guarantees provided by the memory model. To address this and accommodate a better balance between programmability and optimization we present ideas for a simple concurrency semantics for Java that avoids basic problems at a cost of backward compatibility.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123773307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extended sequential reasoning for data-race-free programs 无数据竞争程序的扩展顺序推理
Pub Date : 2011-06-05 DOI: 10.1145/1988915.1988922
Laura Effinger-Dean, H. Boehm, Dhruva R. Chakrabarti, P. Joisha
Most multithreaded programming languages prohibit or discourage data races. By avoiding data races, we are guaranteed that variables accessed within a synchronization-free code region cannot be modified by other threads, allowing us to reason about such code regions as though they were single-threaded. However, such single-threaded reasoning is not limited to synchronization-free regions. We present a simple characterization of extended interference-free regions in which variables cannot be modified by other threads. This characterization shows that, in the absence of data races, important code analysis problems often have surprisingly easy answers. For instance, we can use local analysis to determine when lock and unlock calls refer to the same mutex. Our characterization can be derived from prior work on safe compiler transformations, but it can also be simply derived from first principles, and justified in a very broad context. In addition, systematic reasoning about overlapping interference-free regions yields insight about optimization opportunities that were not previously apparent. We give preliminary results for a prototype implementation of interference-free regions in the LLVM compiler infrastructure. We also discuss other potential applications for interference-free regions.
大多数多线程编程语言禁止或不鼓励数据竞争。通过避免数据竞争,我们可以保证在无同步代码区域内访问的变量不能被其他线程修改,从而允许我们像单线程一样推断这些代码区域。然而,这种单线程推理并不局限于无同步的区域。我们提出了一个扩展无干扰区域的简单表征,其中变量不能被其他线程修改。这个特征表明,在没有数据竞争的情况下,重要的代码分析问题通常有令人惊讶的简单答案。例如,我们可以使用本地分析来确定lock和unlock调用何时引用同一个互斥锁。我们的描述可以从先前关于安全编译器转换的工作中得出,但是它也可以简单地从第一原则中得出,并在非常广泛的上下文中得到证明。此外,关于重叠无干扰区域的系统推理产生了以前不明显的优化机会的见解。我们给出了在LLVM编译器基础结构中无干扰区域的原型实现的初步结果。我们还讨论了无干扰区域的其他潜在应用。
{"title":"Extended sequential reasoning for data-race-free programs","authors":"Laura Effinger-Dean, H. Boehm, Dhruva R. Chakrabarti, P. Joisha","doi":"10.1145/1988915.1988922","DOIUrl":"https://doi.org/10.1145/1988915.1988922","url":null,"abstract":"Most multithreaded programming languages prohibit or discourage data races. By avoiding data races, we are guaranteed that variables accessed within a synchronization-free code region cannot be modified by other threads, allowing us to reason about such code regions as though they were single-threaded. However, such single-threaded reasoning is not limited to synchronization-free regions. We present a simple characterization of extended interference-free regions in which variables cannot be modified by other threads.\u0000 This characterization shows that, in the absence of data races, important code analysis problems often have surprisingly easy answers. For instance, we can use local analysis to determine when lock and unlock calls refer to the same mutex. Our characterization can be derived from prior work on safe compiler transformations, but it can also be simply derived from first principles, and justified in a very broad context. In addition, systematic reasoning about overlapping interference-free regions yields insight about optimization opportunities that were not previously apparent.\u0000 We give preliminary results for a prototype implementation of interference-free regions in the LLVM compiler infrastructure. We also discuss other potential applications for interference-free regions.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Data-race exceptions have benefits beyond the memory model 数据竞争异常的好处超出了内存模型
Pub Date : 2011-06-05 DOI: 10.1145/1988915.1988923
Benjamin P. Wood, L. Ceze, D. Grossman
Proposals to treat data races as exceptions provide simplified semantics for shared-memory multithreaded programming languages and memory models by guaranteeing that execution remains data-race-free and sequentially consistent or an exception is raised. However, the high cost of precise race detection has kept the cost-to-benefit ratio of data-race exceptions too high for widespread adoption. Most research to improve this ratio focuses on lowering performance cost. In this position paper, we argue that with small changes in how we view data races, data-race exceptions enable a broad class of benefits beyond the memory model, including performance and simplicity in applications at the runtime system level. When attempted (but exception-raising) racy accesses are treated as legal --- but exceptional --- behavior, applications can exploit the guarantees of the data-race exception mechanism by performing potentially racy accesses and guiding execution based on whether these potential races manifest as exceptions. We apply these insights to concurrent garbage collection, optimistic synchronization elision, and best-effort automatic recovery from exceptions due to sequential-consistency-violating races.
将数据竞争视为异常的建议为共享内存多线程编程语言和内存模型提供了简化的语义,保证执行保持无数据竞争和顺序一致,否则会引发异常。然而,精确竞争检测的高成本使得数据竞争异常的成本效益比过高,无法广泛采用。大多数提高这一比率的研究都集中在降低性能成本上。在这篇意见书中,我们认为,通过对数据竞争的看法进行微小的改变,数据竞争异常可以带来内存模型之外的广泛好处,包括运行时系统级应用程序的性能和简单性。当尝试(但引发异常)的动态访问被视为合法但异常的行为时,应用程序可以通过执行潜在的动态访问并根据这些潜在的竞争是否表现为异常来指导执行,从而利用数据竞争异常机制的保证。我们将这些见解应用于并发垃圾收集、乐观同步省略以及从违反顺序一致性的竞争引起的异常中尽最大努力自动恢复。
{"title":"Data-race exceptions have benefits beyond the memory model","authors":"Benjamin P. Wood, L. Ceze, D. Grossman","doi":"10.1145/1988915.1988923","DOIUrl":"https://doi.org/10.1145/1988915.1988923","url":null,"abstract":"Proposals to treat data races as exceptions provide simplified semantics for shared-memory multithreaded programming languages and memory models by guaranteeing that execution remains data-race-free and sequentially consistent or an exception is raised. However, the high cost of precise race detection has kept the cost-to-benefit ratio of data-race exceptions too high for widespread adoption. Most research to improve this ratio focuses on lowering performance cost.\u0000 In this position paper, we argue that with small changes in how we view data races, data-race exceptions enable a broad class of benefits beyond the memory model, including performance and simplicity in applications at the runtime system level. When attempted (but exception-raising) racy accesses are treated as legal --- but exceptional --- behavior, applications can exploit the guarantees of the data-race exception mechanism by performing potentially racy accesses and guiding execution based on whether these potential races manifest as exceptions. We apply these insights to concurrent garbage collection, optimistic synchronization elision, and best-effort automatic recovery from exceptions due to sequential-consistency-violating races.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114181913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance implications of fence-based memory models 基于栅栏的内存模型的性能含义
Pub Date : 2011-06-05 DOI: 10.1145/1988915.1988919
H. Boehm
Most mainstream shared-memory parallel programming languages are converging to a memory model, or shared variable semantics, centered on providing sequential consistency for most data-race-free programs. OpenMP, along with a small number of other languages, defines its memory model in terms of implicit fence (e.g. OpenMP flush) operations that force memory accesses to become visible to other threads in order. Synchronization operations provided by the language implicitly include such fences. In the simplest cases this is equivalent to a promise of sequential consistency for data-race-free programs. However, real languages typically also provide atomic operations with weak memory ordering constraints, such as the OpenMP atomic directives. These break the above equivalence, making the fence-based model stronger in ways that are observable, but not generally useful. As a result, conventional lock implementations are often accidentally prohibited, adding significant overhead for uncontended locks. We show that this problem affects both OpenMP and, in a more subtle way, UPC. We have been working with the OpenMP ARB to resolve these issues in future versions of OpenMP.
大多数主流的共享内存并行编程语言都在向内存模型或共享变量语义靠拢,其核心是为大多数无数据竞争的程序提供顺序一致性。OpenMP和少数其他语言一起,根据隐式栅栏(例如OpenMP flush)操作来定义其内存模型,这些操作强制内存访问顺序对其他线程可见。该语言提供的同步操作隐式地包括这样的隔离。在最简单的情况下,这相当于对无数据竞争的程序的顺序一致性的承诺。但是,实际语言通常还提供具有弱内存排序约束的原子操作,例如OpenMP原子指令。它们打破了上述等价性,使基于栅栏的模型以可观察的方式变得更强,但通常不是有用的。因此,传统的锁实现经常被意外地禁止,为非争用锁增加了显著的开销。我们表明,这个问题既影响OpenMP,也以一种更微妙的方式影响UPC。我们一直在与OpenMP ARB合作,在未来版本的OpenMP中解决这些问题。
{"title":"Performance implications of fence-based memory models","authors":"H. Boehm","doi":"10.1145/1988915.1988919","DOIUrl":"https://doi.org/10.1145/1988915.1988919","url":null,"abstract":"Most mainstream shared-memory parallel programming languages are converging to a memory model, or shared variable semantics, centered on providing sequential consistency for most data-race-free programs.\u0000 OpenMP, along with a small number of other languages, defines its memory model in terms of implicit fence (e.g. OpenMP flush) operations that force memory accesses to become visible to other threads in order. Synchronization operations provided by the language implicitly include such fences. In the simplest cases this is equivalent to a promise of sequential consistency for data-race-free programs.\u0000 However, real languages typically also provide atomic operations with weak memory ordering constraints, such as the OpenMP atomic directives. These break the above equivalence, making the fence-based model stronger in ways that are observable, but not generally useful. As a result, conventional lock implementations are often accidentally prohibited, adding significant overhead for uncontended locks.\u0000 We show that this problem affects both OpenMP and, in a more subtle way, UPC. We have been working with the OpenMP ARB to resolve these issues in future versions of OpenMP.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126903969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Minor memory references matter in collaborative caching 次要内存引用在协作缓存中很重要
Pub Date : 2011-06-05 DOI: 10.1145/1988915.1988927
Xiaoming Gu
Collaborative caching uses different caching methods, e. g., LRU and MRU, for data with good or poor locality. Poorlocality data are evicted by MRU quickly, leaving most cache space to hold good-locality data by LRU. In our previous study, we selected static memory references with poor locality to use MRU but neglected minor references, which are memory instructions that contribute no more than 0.1% total memory accesses. After removing this restriction, we found that three SPEC CPU benchmarks have on average 6.2 times fewer miss reduction or 9.8% reduction in absolute miss ratio.
协作缓存使用不同的缓存方法,例如LRU和MRU,来处理局部性好的数据和局部性差的数据。低局域数据被MRU快速清除,留下大部分缓存空间给LRU保存良好局域数据。在我们之前的研究中,我们选择了局域性差的静态内存引用来使用MRU,但忽略了次要引用,这些引用是内存指令,贡献不超过总内存访问的0.1%。在去掉这个限制之后,我们发现三个SPEC CPU基准测试的脱靶率平均降低了6.2倍,绝对脱靶率降低了9.8%。
{"title":"Minor memory references matter in collaborative caching","authors":"Xiaoming Gu","doi":"10.1145/1988915.1988927","DOIUrl":"https://doi.org/10.1145/1988915.1988927","url":null,"abstract":"Collaborative caching uses different caching methods, e. g., LRU and MRU, for data with good or poor locality. Poorlocality data are evicted by MRU quickly, leaving most cache space to hold good-locality data by LRU. In our previous study, we selected static memory references with poor locality to use MRU but neglected minor references, which are memory instructions that contribute no more than 0.1% total memory accesses. After removing this restriction, we found that three SPEC CPU benchmarks have on average 6.2 times fewer miss reduction or 9.8% reduction in absolute miss ratio.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114519788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Garbage collection for multicore NUMA machines 多核NUMA机器的垃圾收集
Pub Date : 2011-05-12 DOI: 10.1145/1988915.1988929
Sven Auhagen, Lars Bergstrom, M. Fluet, John H. Reppy
Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks. This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.
现代高端机器具有多个处理器包,每个处理器包都包含多个独立的内核和直接连接到专用物理RAM的集成内存控制器。这些包通过共享总线连接起来,从而创建一个具有异构内存层次结构的系统。由于此共享总线的带宽小于到内存的链接的总和,所以当并行线程都访问其处理器包的本地内存时,总内存带宽比访问附加到远程包的内存时要高。这种带宽限制传统上限制了现代函数式语言实现的可伸缩性,即使在小型基准测试中,也很少能扩展到8核以上。这项工作提出了一个与我们严格的并行函数语言实现集成的垃圾收集器,Manticore,并表明它在48核AMD Opteron机器和32核Intel Xeon机器上都有效地扩展。
{"title":"Garbage collection for multicore NUMA machines","authors":"Sven Auhagen, Lars Bergstrom, M. Fluet, John H. Reppy","doi":"10.1145/1988915.1988929","DOIUrl":"https://doi.org/10.1145/1988915.1988929","url":null,"abstract":"Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks.\u0000 This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"2003 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
The case for simple, visible cache coherency 简单、可见的缓存一致性
Pub Date : 2008-03-02 DOI: 10.1145/1353522.1353532
R. Kunz, M. Horowitz
The shared memory research community has proposed many complex communication protocols that aim to eliminate specific performance bottlenecks, while still providing an easy-to-use communication interface. Although tailored protocols can eliminate some bottlenecks that arise in real applications, removing the cause of the bottleneck through software optimizations and bug fixes is cheaper to implement, faster to fix (once found), and requires no additional support by the hardware beyond a simple shared memory interface. In fact, in our experience, the choice of coherence protocol is much less important than providing an efficient hardware feedback that indentifies the source of the problem. Future cache-coherence research should focus efforts on illuminating memory system behavior, providing smarter tools to identify bottlenecks, and helping to eliminate them through software optimizations.
共享内存研究社区已经提出了许多复杂的通信协议,旨在消除特定的性能瓶颈,同时仍然提供易于使用的通信接口。虽然定制的协议可以消除实际应用程序中出现的一些瓶颈,但是通过软件优化和错误修复来消除瓶颈的原因实现起来更便宜,修复起来更快(一旦发现),并且除了简单的共享内存接口之外,不需要硬件的额外支持。事实上,根据我们的经验,选择一致性协议远不如提供有效的硬件反馈来识别问题的根源重要。未来的缓存一致性研究应该集中精力阐明内存系统的行为,提供更智能的工具来识别瓶颈,并通过软件优化来帮助消除它们。
{"title":"The case for simple, visible cache coherency","authors":"R. Kunz, M. Horowitz","doi":"10.1145/1353522.1353532","DOIUrl":"https://doi.org/10.1145/1353522.1353532","url":null,"abstract":"The shared memory research community has proposed many complex communication protocols that aim to eliminate specific performance bottlenecks, while still providing an easy-to-use communication interface. Although tailored protocols can eliminate some bottlenecks that arise in real applications, removing the cause of the bottleneck through software optimizations and bug fixes is cheaper to implement, faster to fix (once found), and requires no additional support by the hardware beyond a simple shared memory interface. In fact, in our experience, the choice of coherence protocol is much less important than providing an efficient hardware feedback that indentifies the source of the problem. Future cache-coherence research should focus efforts on illuminating memory system behavior, providing smarter tools to identify bottlenecks, and helping to eliminate them through software optimizations.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133461034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
GC assertions: using the garbage collector to check heap properties GC断言:使用垃圾收集器检查堆属性
Pub Date : 2008-03-02 DOI: 10.1145/1353522.1353533
E. Aftandilian, Samuel Z. Guyer
This paper introduces GC assertions, a system interface that programmers can use to check for errors, such as data structure invariant violations, and to diagnose performance problems, such as memory leaks. GC assertions are checked by the garbage collector, which is in a unique position to gather information and answer questions about the lifetime and connectivity of objects in the heap. We introduce several kinds of GC assertions, and we describe how they are implemented in the collector. We also describe our reporting mechanism, which provides a complete path through the heap to the offending objects. We show results for one type of assertion that allows the programmer to indicate that an object should be reclaimed at the next GC. We find that using this assertion we can quickly identify a memory leak and its cause with negligible overhead.
本文介绍了GC断言,这是一种系统接口,程序员可以使用它来检查错误(如数据结构不变性违反)和诊断性能问题(如内存泄漏)。GC断言由垃圾收集器检查,垃圾收集器在收集信息和回答有关堆中对象的生命周期和连通性的问题方面处于独特的位置。我们将介绍几种GC断言,并描述它们是如何在收集器中实现的。我们还描述了我们的报告机制,该机制提供了通过堆到违规对象的完整路径。我们将显示一种类型的断言的结果,这种断言允许程序员指示应该在下一次GC时回收对象。我们发现使用这个断言可以快速识别内存泄漏及其原因,开销可以忽略不计。
{"title":"GC assertions: using the garbage collector to check heap properties","authors":"E. Aftandilian, Samuel Z. Guyer","doi":"10.1145/1353522.1353533","DOIUrl":"https://doi.org/10.1145/1353522.1353533","url":null,"abstract":"This paper introduces GC assertions, a system interface that programmers can use to check for errors, such as data structure invariant violations, and to diagnose performance problems, such as memory leaks. GC assertions are checked by the garbage collector, which is in a unique position to gather information and answer questions about the lifetime and connectivity of objects in the heap. We introduce several kinds of GC assertions, and we describe how they are implemented in the collector. We also describe our reporting mechanism, which provides a complete path through the heap to the offending objects. We show results for one type of assertion that allows the programmer to indicate that an object should be reclaimed at the next GC. We find that using this assertion we can quickly identify a memory leak and its cause with negligible overhead.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115968109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Reasoning about the ARM weakly consistent memory model 关于ARM弱一致内存模型的推理
Pub Date : 2008-03-02 DOI: 10.1145/1353522.1353528
Nathan Chong, Samin S. Ishtiaq
This paper describes a formalization of the ARM weakly consistent memory model: the architectural contract between parallel programs and shared memory multiprocessor implementations. We claim that a clean, unambiguous, and mechanically verifiable specification is a valuable resource for architects, micro-architects and programmers; it allows implementors to forge aggressive static (compiler) and dynamic (JIT, micro-architecture) machines to run code. We discuss the key construct of the ARM memory model, observability -- the order in which memory accesses become visible to processors in a shared memory multiprocessor system -- and examine its use in litmus tests.
本文描述了ARM弱一致内存模型的形式化:并行程序和共享内存多处理器实现之间的体系结构契约。我们认为,对于架构师、微架构师和程序员来说,干净、明确、可机械验证的规范是一种宝贵的资源;它允许实现者锻造积极的静态(编译器)和动态(JIT,微体系结构)机器来运行代码。我们讨论了ARM内存模型的关键结构,可观察性——在共享内存多处理器系统中,内存访问对处理器可见的顺序——并检查了它在石蕊试验中的使用。
{"title":"Reasoning about the ARM weakly consistent memory model","authors":"Nathan Chong, Samin S. Ishtiaq","doi":"10.1145/1353522.1353528","DOIUrl":"https://doi.org/10.1145/1353522.1353528","url":null,"abstract":"This paper describes a formalization of the ARM weakly consistent memory model: the architectural contract between parallel programs and shared memory multiprocessor implementations. We claim that a clean, unambiguous, and mechanically verifiable specification is a valuable resource for architects, micro-architects and programmers; it allows implementors to forge aggressive static (compiler) and dynamic (JIT, micro-architecture) machines to run code. We discuss the key construct of the ARM memory model, observability -- the order in which memory accesses become visible to processors in a shared memory multiprocessor system -- and examine its use in litmus tests.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
What can performance counters do for memory subsystem analysis? 性能计数器可以为内存子系统分析做什么?
Pub Date : 2008-03-02 DOI: 10.1145/1353522.1353531
S. Eranian
Nowadays, all major processors provide a set of performance counters which capture micro-architectural level information, such as the number of elapsed cycles, cache misses, or instructions executed. Counters can be found in processor cores, processor die, chipsets, or in I/O cards. They can provide a wealth of information as to how the hardware is being used by software. Many processors now support events to measure precisely and with very limited overhead, the traffic between a core and the memory subsystem. It is possible to compute average load latency and bus band-width utilization. This valuable information can be used to improve code quality and placement of threads to maximize hardware utilization. We postulate that performance counters are the key hardware resource to locate and understand issues related to the memory subsystem. In this paper we illustrate our position by showing how certain key memory performance metrics can be gathered easily on today's hardware.
现在,所有主要的处理器都提供了一组性能计数器,用于捕获微体系结构级别的信息,例如经过的周期数、缓存丢失或执行的指令。计数器可以在处理器内核、处理器芯片、芯片组或I/O卡中找到。它们可以提供关于软件如何使用硬件的大量信息。现在,许多处理器都支持事件,以便在非常有限的开销下精确测量内核和内存子系统之间的流量。可以计算平均负载延迟和总线带宽利用率。这些有价值的信息可用于改进代码质量和线程的位置,以最大限度地提高硬件利用率。我们假设性能计数器是定位和理解与内存子系统相关的问题的关键硬件资源。在本文中,我们通过展示如何在当今的硬件上轻松收集某些关键内存性能指标来说明我们的立场。
{"title":"What can performance counters do for memory subsystem analysis?","authors":"S. Eranian","doi":"10.1145/1353522.1353531","DOIUrl":"https://doi.org/10.1145/1353522.1353531","url":null,"abstract":"Nowadays, all major processors provide a set of performance counters which capture micro-architectural level information, such as the number of elapsed cycles, cache misses, or instructions executed. Counters can be found in processor cores, processor die, chipsets, or in I/O cards. They can provide a wealth of information as to how the hardware is being used by software. Many processors now support events to measure precisely and with very limited overhead, the traffic between a core and the memory subsystem. It is possible to compute average load latency and bus band-width utilization. This valuable information can be used to improve code quality and placement of threads to maximize hardware utilization.\u0000 We postulate that performance counters are the key hardware resource to locate and understand issues related to the memory subsystem. In this paper we illustrate our position by showing how certain key memory performance metrics can be gathered easily on today's hardware.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127775042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
期刊
Workshop on Memory System Performance and Correctness
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1