Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

2014 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2014-07-21 DOI:10.1109/HPCSim.2014.6903668

Milan Stanic, Oscar Palomar, Ivan Ratković, M. Duric, O. Unsal, A. Cristal, M. Valero

引用次数: 11

Abstract

Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Graph500在Intel Xeon Phi处理器上的向量化潜力评估

Graph500是一个用于高性能计算的数据密集型应用程序，它是一个越来越重要的工作负载，因为图是大多数分析应用程序的核心部分。到目前为止，还没有研究Graph500是否适合矢量化的工作，这主要是由于缺乏用于不规则内存访问的矢量内存指令。Xeon Phi是英特尔最近发布的一款大规模并行处理器，具有512位宽矢量单元和矢量散射/收集指令等新功能。因此，Xeon Phi处理器允许Graph500与矢量化相结合的更有效的并行化。本文对Graph500进行了向量化，并分析了向量化和预取对Xeon Phi处理器的影响。我们还展示了并行化、向量化和预取的组合，与不利用Xeon Phi的矢量功能的预取并行版本相比，其速度提高了27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量