Exploiting locality in the run-time parallelization of irregular loops

Proceedings International Conference on Parallel Processing Pub Date : 2002-08-18 DOI:10.1109/ICPP.2002.1040856

María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera

引用次数: 14

Abstract

The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用不规则循环运行时并行化的局部性

这项工作的目标是通过间接数组访问有效地并行执行循环，以便嵌入到并行编译器框架中。在这种循环模式中，依赖关系不能总是在编译时确定，因为在许多情况下，它们涉及只有在运行时才知道的输入数据和/或访问模式太复杂而无法分析。本文提出了并行化这些循环的运行时策略。我们的方法不仅关注于提取循环迭代之间的并行性，而且还关注于利用数据访问局部性来改善内存层次结构行为，从而提高整个程序的速度。提出了两种策略，一种基于图划分技术，另一种基于块循环分布。实验结果表明，这两种策略是互补的，最佳方案的选择取决于环路模式的某些特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings International Conference on Parallel Processing

自引率

0.00%

发文量