Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2018-03-21 DOI:10.1109/PDP2018.2018.00042

N. Tanabe, Toshio Endo

{"title":"Characterizing Memory-Latency Sensitivity of Sparse Matrix Kernels","authors":"N. Tanabe, Toshio Endo","doi":"10.1109/PDP2018.2018.00042","DOIUrl":null,"url":null,"abstract":"Intel announced to launch a Xeon with high-latency main memory based on 3D Xpoint in 2018. This paper presents the performance evaluation of sparse matrix kernels on the future supercomputers with high-latency main memory such as 3D Xpoint. The authors propose a high throughput evaluation methodology for exhaustive experiments, which use the University of Florida sparse matrix collection and/or LIS (a Library of Iterative Solvers for linear systems) etc. Proposed methodology is very simple to use, highly flexible for environment and high-throughput. Latency sensitivity of SpMV is measured based on the proposed methodology with 208 sparse matrices and ten storage formats only in two days, which would take for about ten years by conventional simulators. We got several interesting knowledge about latency-sensitive kernels, sparse matrices, storage formats, and preconditioners, etc. We observed notable latency sensitivity in some applications, which are Graph500, HPCG and a part of preconditioners of iterative solvers. We found latency sensitivities of SpMV are high for larger matrices than the capacity of last level cache. This suggests main memory using 3D Xpoint must be combined with large DRAM cache.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Intel announced to launch a Xeon with high-latency main memory based on 3D Xpoint in 2018. This paper presents the performance evaluation of sparse matrix kernels on the future supercomputers with high-latency main memory such as 3D Xpoint. The authors propose a high throughput evaluation methodology for exhaustive experiments, which use the University of Florida sparse matrix collection and/or LIS (a Library of Iterative Solvers for linear systems) etc. Proposed methodology is very simple to use, highly flexible for environment and high-throughput. Latency sensitivity of SpMV is measured based on the proposed methodology with 208 sparse matrices and ten storage formats only in two days, which would take for about ten years by conventional simulators. We got several interesting knowledge about latency-sensitive kernels, sparse matrices, storage formats, and preconditioners, etc. We observed notable latency sensitivity in some applications, which are Graph500, HPCG and a part of preconditioners of iterative solvers. We found latency sensitivities of SpMV are high for larger matrices than the capacity of last level cache. This suggests main memory using 3D Xpoint must be combined with large DRAM cache.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

稀疏矩阵核的内存延迟敏感性研究

英特尔宣布将于2018年推出基于3D Xpoint的高延迟主存至强处理器。本文介绍了稀疏矩阵核在3D Xpoint等具有高延迟主存的未来超级计算机上的性能评价。作者提出了一种用于穷举实验的高通量评估方法，该方法使用佛罗里达大学的稀疏矩阵集合和/或LIS(线性系统的迭代求解器库)等。所提出的方法使用简单，对环境具有高度灵活性和高通量。基于该方法，利用208个稀疏矩阵和10种存储格式，在2天内测量了SpMV的延迟灵敏度，而传统的模拟器需要10年左右的时间。我们获得了一些关于延迟敏感内核、稀疏矩阵、存储格式和前置条件等方面的有趣知识。我们在Graph500、HPCG和部分迭代求解器的前置条件下观察到明显的延迟敏感性。我们发现SpMV对比最后一级缓存容量更大的矩阵的延迟灵敏度更高。这表明使用3D Xpoint的主存储器必须与大型DRAM缓存相结合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量