基于openmp的矩阵-矩阵相乘并行实现在英特尔骑士登陆上

Proceedings of Workshops of HPC Asia Pub Date : 2018-01-31 DOI:10.1145/3176364.3176374

Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi

{"title":"基于openmp的矩阵-矩阵相乘并行实现在英特尔骑士登陆上","authors":"Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi","doi":"10.1145/3176364.3176374","DOIUrl":null,"url":null,"abstract":"The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have emerged with 2D tile mesh architecture. Implementing of the general matrix-matrix multiplication on a new architecture is an important practice. To date, there has not been a sufficient description on a parallel implementation of the general matrix-matrix multiplication. In this study, we describe the parallel implementation of the double-precision general matrix-matrix multiplication (DGEMM) with OpenMP on the KNL. The implementation is based on the blocked matrix-matrix multiplication. We propose a method for choosing the cache block sizes and discuss the parallelism within the implementation of DGEMM. We show that the performance of DGEMM varies by the thread affinity environment variables. We conducted the performance experiments with the Intel Xeon Phi 7210 and 7250. The performance experiments validate our method.","PeriodicalId":371083,"journal":{"name":"Proceedings of Workshops of HPC Asia","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing\",\"authors\":\"Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi\",\"doi\":\"10.1145/3176364.3176374\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have emerged with 2D tile mesh architecture. Implementing of the general matrix-matrix multiplication on a new architecture is an important practice. To date, there has not been a sufficient description on a parallel implementation of the general matrix-matrix multiplication. In this study, we describe the parallel implementation of the double-precision general matrix-matrix multiplication (DGEMM) with OpenMP on the KNL. The implementation is based on the blocked matrix-matrix multiplication. We propose a method for choosing the cache block sizes and discuss the parallelism within the implementation of DGEMM. We show that the performance of DGEMM varies by the thread affinity environment variables. We conducted the performance experiments with the Intel Xeon Phi 7210 and 7250. The performance experiments validate our method.\",\"PeriodicalId\":371083,\"journal\":{\"name\":\"Proceedings of Workshops of HPC Asia\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Workshops of HPC Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3176364.3176374\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshops of HPC Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3176364.3176374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

代号为Knights Landing (KNL)的第二代Intel Xeon Phi处理器已经出现了2D tile mesh架构。在新体系结构上实现一般矩阵-矩阵乘法是一个重要的实践。到目前为止，对一般矩阵-矩阵乘法的并行实现还没有足够的描述。在本研究中，我们描述了用OpenMP在KNL上并行实现双精度一般矩阵-矩阵乘法(DGEMM)。实现是基于阻塞矩阵-矩阵乘法。我们提出了一种选择缓存块大小的方法，并讨论了DGEMM实现中的并行性。我们展示了DGEMM的性能随线程关联环境变量而变化。我们用Intel Xeon Phi 7210和7250进行了性能实验。性能实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing

The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have emerged with 2D tile mesh architecture. Implementing of the general matrix-matrix multiplication on a new architecture is an important practice. To date, there has not been a sufficient description on a parallel implementation of the general matrix-matrix multiplication. In this study, we describe the parallel implementation of the double-precision general matrix-matrix multiplication (DGEMM) with OpenMP on the KNL. The implementation is based on the blocked matrix-matrix multiplication. We propose a method for choosing the cache block sizes and discuss the parallelism within the implementation of DGEMM. We show that the performance of DGEMM varies by the thread affinity environment variables. We conducted the performance experiments with the Intel Xeon Phi 7210 and 7250. The performance experiments validate our method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of Workshops of HPC Asia

自引率

0.00%

发文量

期刊最新文献

Scaling collectives on large clusters using Intel(R) architecture processors and fabric OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing Recent experiences in using MPI-3 RMA in the DASH PGAS runtime Optimizing a particle-in-cell code on Intel knights landing Towards a parallel algebraic multigrid solver using PGAS