On the interactions between ILP and TLP with hardware transactional memory

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Microprocessors and Microsystems Pub Date : 2024-02-01 Epub Date: 2023-11-19 DOI:10.1016/j.micpro.2023.104975

Víctor Nicolás-Conesa, Rubén Titos-Gil, Ricardo Fernández-Pascual, Alberto Ros, Manuel E. Acacio

{"title":"On the interactions between ILP and TLP with hardware transactional memory","authors":"Víctor Nicolás-Conesa, Rubén Titos-Gil, Ricardo Fernández-Pascual, Alberto Ros, Manuel E. Acacio","doi":"10.1016/j.micpro.2023.104975","DOIUrl":null,"url":null,"abstract":"<div><p>Hardware implementations of Transactional Memory (HTM) are designed to facilitate efficient thread synchronization in parallel programs, encouraging the use of larger critical sections. By employing optimistic concurrency control to execute transactions speculatively, HTM systems promise to deliver the performance benefits typically associated with fine-grained locks. In doing so, HTM systems must deal with transaction aborts. While under certain conditions aborts may be caused by the inherent limitations of hardware structures employed to implement TM (e.g., caches), conflicting concurrent accesses to shared memory locations are generally the prevailing cause for squashing the work done by a transaction</p><p>In this study, we present what we believe to be, to the best of our knowledge, the first characterization of how the aggressiveness of processor cores, particularly their ability to exploit instruction-level parallelism (ILP), interacts with the support for optimistic thread-level speculation offered by HTM systems. We have observed that by adjusting the size of structures that facilitate out-of-order and speculative execution, the number of aborts in the execution of transactional workloads can be altered in best-effort HTM implementations. Our findings indicate that in scenarios with high contention, a smaller number of powerful cores is more suitable, whereas in low contention scenarios, using a larger number of less aggressive cores is preferable. In addition, HTM systems that employ lazy detection and those employing eager detection with requester-stalls resolution, benefit from using simpler cores. In conclusion, abort ratios can be reduced with a careful choice of both processor aggressiveness and design aspects for each application depending on its contention.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"104 ","pages":"Article 104975"},"PeriodicalIF":2.6000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S014193312300220X/pdfft?md5=ce105b99f7f43d90376360a92db4669c&pid=1-s2.0-S014193312300220X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193312300220X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Hardware implementations of Transactional Memory (HTM) are designed to facilitate efficient thread synchronization in parallel programs, encouraging the use of larger critical sections. By employing optimistic concurrency control to execute transactions speculatively, HTM systems promise to deliver the performance benefits typically associated with fine-grained locks. In doing so, HTM systems must deal with transaction aborts. While under certain conditions aborts may be caused by the inherent limitations of hardware structures employed to implement TM (e.g., caches), conflicting concurrent accesses to shared memory locations are generally the prevailing cause for squashing the work done by a transaction

In this study, we present what we believe to be, to the best of our knowledge, the first characterization of how the aggressiveness of processor cores, particularly their ability to exploit instruction-level parallelism (ILP), interacts with the support for optimistic thread-level speculation offered by HTM systems. We have observed that by adjusting the size of structures that facilitate out-of-order and speculative execution, the number of aborts in the execution of transactional workloads can be altered in best-effort HTM implementations. Our findings indicate that in scenarios with high contention, a smaller number of powerful cores is more suitable, whereas in low contention scenarios, using a larger number of less aggressive cores is preferable. In addition, HTM systems that employ lazy detection and those employing eager detection with requester-stalls resolution, benefit from using simpler cores. In conclusion, abort ratios can be reduced with a careful choice of both processor aggressiveness and design aspects for each application depending on its contention.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ILP和TLP与硬件事务性内存之间的相互作用

事务性内存(HTM)的硬件实现旨在促进并行程序中的高效线程同步，鼓励使用更大的临界区。通过采用乐观并发控制来推测地执行事务，HTM系统承诺提供通常与细粒度锁相关的性能优势。为此，HTM系统必须处理事务中止。虽然在某些情况下，中断可能是由用于实现TM的硬件结构的固有限制(例如，缓存)引起的，但对共享内存位置的冲突并发访问通常是挤占事务完成工作的主要原因。在本研究中，我们提出了我们认为的，据我们所知，处理器内核的侵略性如何的第一个特征，特别是它们利用指令级并行性(ILP)的能力，与HTM系统提供的乐观线程级推测的支持相互作用。我们观察到，通过调整有利于乱序执行和推测执行的结构的大小，可以在尽力而为的HTM实现中改变事务性工作负载执行中的中止数量。我们的研究结果表明，在高争用的场景中，较少数量的强大核心更合适，而在低争用的场景中，使用较多数量的不那么激进的核心更可取。此外，采用惰性检测的HTM系统和采用具有请求者延迟解析的渴望检测的HTM系统都受益于使用更简单的内核。总之，根据每个应用程序的争用情况，仔细选择处理器侵略性和设计方面，可以减少中断比率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.

期刊最新文献

CGR-AI Engine: A scalable CGRA-based processing platform for Artificial Intelligence in space applications The TEXTAROSSA project: Cool all the Way Down to the Hardware Analyzing the impact of functional approximation on the resilience of Deep Neural Networks Efficient associative processing in FPGA LoLiPoP-IoT: Advancing the energy-efficient Internet of Things