Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC) Pub Date : 2022-07-01 DOI:10.1109/ISPDC55340.2022.00021

Juan Salamanca

引用次数: 0

Abstract

Speculative Taskloop (STL) is a loop parallelization technique that takes the best of Task-based Parallelism and Thread-Level Speculation to speed up loops with may loop-carried dependencies that were previously difficult for compilers to parallelize. Previous studies show the efficiency of STL when implemented using Hardware Transactional Memory and the advantages it offers compared to a typical DOACROSS technique such as OpenMP ordered. This paper presents a performance comparison between STL and a previously proposed technique that implements Thread-Level Speculation (TLS) in the for worksharing construct (FOR-TLS) over a set of loops from cbench and SPEC2006 benchmarks. The results show interesting insights on how each technique can be more appropriate depending on the characteristics of the evaluated loop. Experimental results reveal that by implementing both techniques on top of HTM, speed-ups of up to 2.41× can be obtained for STL and up to 2× for FOR-TLS.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

推测型Taskloop和OpenMP-for-Loop在硬件事务性内存上线程级推测的性能比较

STL (Speculative Taskloop)是一种循环并行化技术，它充分利用了基于任务的并行性和线程级的推测来加速循环，这些循环携带的依赖关系以前很难被编译器并行化。以前的研究表明，使用硬件事务性内存实现STL的效率，以及与典型的DOACROSS技术(如OpenMP命令)相比，它提供的优势。本文介绍了STL和先前提出的一种技术之间的性能比较，该技术在工作共享结构(for -TLS)中通过一组来自cbench和SPEC2006基准测试的循环实现线程级推测(TLS)。结果显示了一些有趣的见解，说明每种技术如何根据被评估循环的特征更合适。实验结果表明，通过在HTM上实现这两种技术，STL可以获得高达2.41倍的加速，for - tls可以获得高达2倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

自引率

0.00%

发文量

期刊最新文献

Estimating the Impact of Communication Schemes for Distributed Graph Processing Sponsors and Conference Support Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory [Full] Deep Heuristic for Broadcasting in Arbitrary Networks Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs