FGSCM:事务锁省略的细粒度方法

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2017-10-01 DOI:10.1109/SBAC-PAD.2017.22

Gustavo José de Sousa, A. Baldassin

{"title":"FGSCM:事务锁省略的细粒度方法","authors":"Gustavo José de Sousa, A. Baldassin","doi":"10.1109/SBAC-PAD.2017.22","DOIUrl":null,"url":null,"abstract":"Speculative Lock Elision (SLE) is a technique that allows critical sections to be executed optimistically by eliding the lock operation and enabling multiple threads to execute concurrently. In case of inconsistencies, the hardware automatically rolls back the execution and pessimistically acquires the original lock during runtime. The decision to elide the lock in SLE is performed transparently at the microarchitecture level and, although being convenient, it may sometimes hurt performance. To avoid that case, researchers have investigated Transactional Lock Elision (TLE), in which software-controlled hardware transactions are used instead, allowing the creation of policies and heuristics to manage lock elision. Typical implementations of TLE make use of a single lock to serialize the execution in case the original lock cannot be elided, which can potentially degrade performance. In order to improve on such cases, this paper proposes the Fine-Grained Software-assisted Conflict Management (FGSCM) scheme, a TLE technique that employs multiple locks so as to avoid unnecessary serialization of the code. The main idea of FGSCM is that not all threads that conflict inside a critical section are acessing the same region of shared memory. By automatically assigning distinct locks to these threads according to the memory section they access, the level of concurrency can be increased. In this paper we formalize FGSCM and provide an in-depth performance evaluation using a microbenchmark to stress several conflict behaviors. Our initial results with a prototype implementation using Intels Restricted Transactional Memory (RTM) are encouraging. With a quadcore machine, we observed an average performance gain of 11% compared to the single-auxiliary-lock SCM and 36% compared to a standard lock scheme, both for typical read-dominated workloads.","PeriodicalId":187204,"journal":{"name":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FGSCM: A Fine-Grained Approach to Transactional Lock Elision\",\"authors\":\"Gustavo José de Sousa, A. Baldassin\",\"doi\":\"10.1109/SBAC-PAD.2017.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speculative Lock Elision (SLE) is a technique that allows critical sections to be executed optimistically by eliding the lock operation and enabling multiple threads to execute concurrently. In case of inconsistencies, the hardware automatically rolls back the execution and pessimistically acquires the original lock during runtime. The decision to elide the lock in SLE is performed transparently at the microarchitecture level and, although being convenient, it may sometimes hurt performance. To avoid that case, researchers have investigated Transactional Lock Elision (TLE), in which software-controlled hardware transactions are used instead, allowing the creation of policies and heuristics to manage lock elision. Typical implementations of TLE make use of a single lock to serialize the execution in case the original lock cannot be elided, which can potentially degrade performance. In order to improve on such cases, this paper proposes the Fine-Grained Software-assisted Conflict Management (FGSCM) scheme, a TLE technique that employs multiple locks so as to avoid unnecessary serialization of the code. The main idea of FGSCM is that not all threads that conflict inside a critical section are acessing the same region of shared memory. By automatically assigning distinct locks to these threads according to the memory section they access, the level of concurrency can be increased. In this paper we formalize FGSCM and provide an in-depth performance evaluation using a microbenchmark to stress several conflict behaviors. Our initial results with a prototype implementation using Intels Restricted Transactional Memory (RTM) are encouraging. With a quadcore machine, we observed an average performance gain of 11% compared to the single-auxiliary-lock SCM and 36% compared to a standard lock scheme, both for typical read-dominated workloads.\",\"PeriodicalId\":187204,\"journal\":{\"name\":\"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2017.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2017.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

推测锁省略(Speculative Lock省略，SLE)是一种技术，它通过省略锁操作并允许多个线程并发执行，从而允许乐观地执行临界区。如果不一致，硬件会自动回滚执行，并在运行时悲观地获取原始锁。在SLE中省略锁的决定是在微体系结构级别透明地执行的，尽管这很方便，但有时可能会损害性能。为了避免这种情况，研究人员研究了事务性锁省略(TLE)，其中使用软件控制的硬件事务，允许创建策略和启发式方法来管理锁省略。TLE的典型实现使用单个锁来序列化执行，以防无法省略原始锁，这可能会降低性能。为了改善这种情况，本文提出了细粒度软件辅助冲突管理(FGSCM)方案，这是一种使用多个锁以避免不必要的代码序列化的TLE技术。FGSCM的主要思想是，并不是所有在临界区内发生冲突的线程都访问共享内存的同一区域。通过根据这些线程访问的内存区自动为它们分配不同的锁，可以提高并发级别。在本文中，我们形式化了FGSCM，并提供了一个深入的性能评估，使用微基准来强调几种冲突行为。我们使用英特尔受限事务性内存(RTM)的原型实现的初步结果令人鼓舞。在四核机器上，我们观察到与单辅助锁SCM相比，它的平均性能提高了11%，与标准锁方案相比，性能提高了36%，这两种方案都适用于典型的以读为主的工作负载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FGSCM: A Fine-Grained Approach to Transactional Lock Elision

Speculative Lock Elision (SLE) is a technique that allows critical sections to be executed optimistically by eliding the lock operation and enabling multiple threads to execute concurrently. In case of inconsistencies, the hardware automatically rolls back the execution and pessimistically acquires the original lock during runtime. The decision to elide the lock in SLE is performed transparently at the microarchitecture level and, although being convenient, it may sometimes hurt performance. To avoid that case, researchers have investigated Transactional Lock Elision (TLE), in which software-controlled hardware transactions are used instead, allowing the creation of policies and heuristics to manage lock elision. Typical implementations of TLE make use of a single lock to serialize the execution in case the original lock cannot be elided, which can potentially degrade performance. In order to improve on such cases, this paper proposes the Fine-Grained Software-assisted Conflict Management (FGSCM) scheme, a TLE technique that employs multiple locks so as to avoid unnecessary serialization of the code. The main idea of FGSCM is that not all threads that conflict inside a critical section are acessing the same region of shared memory. By automatically assigning distinct locks to these threads according to the memory section they access, the level of concurrency can be increased. In this paper we formalize FGSCM and provide an in-depth performance evaluation using a microbenchmark to stress several conflict behaviors. Our initial results with a prototype implementation using Intels Restricted Transactional Memory (RTM) are encouraging. With a quadcore machine, we observed an average performance gain of 11% compared to the single-auxiliary-lock SCM and 36% compared to a standard lock scheme, both for typical read-dominated workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量

期刊最新文献

Resource-Management Study in HPC Runtime-Stacking Context Cloud Workload Prediction and Generation Models GC-CR: A Decentralized Garbage Collector Component for Checkpointing in Clouds Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors Beyond the Fog: Bringing Cross-Platform Code Execution to Constrained IoT Devices