编写快速，运行快速:稀疏深度神经网络在20分钟的开发时间通过SuiteSparse:GraphBLAS

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916550

T. Davis, M. Aznaveh, Scott P. Kolodziej

{"title":"编写快速，运行快速:稀疏深度神经网络在20分钟的开发时间通过SuiteSparse:GraphBLAS","authors":"T. Davis, M. Aznaveh, Scott P. Kolodziej","doi":"10.1109/HPEC.2019.8916550","DOIUrl":null,"url":null,"abstract":"SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. Algorithms written in GraphBLAS achieve high performance with minimal development time. Using GraphBLAS, it took a mere 20 minutes to write a first-cut computational kernel that solves the Sparse Deep Neural Network Graph Challenge. Understanding the problem description and file format, writing code to read in the files that define the problem, and comparing our results with the reference solution took a full day. The kernel consists of a single for-loop around 4 lines of code, all of which are calls to GraphBLAS, and it worked perfectly the first time it was compiled. The sequential performance of the GraphBLAS solution is 3x to 5x faster than the MATLAB reference implementation. OpenMP parallelism gives an additional 10x to 15x speedup on a 20-core Intel processor, 17x on an IBM Power8 system, and 20x on a Power9 system, for the largest problems. Since SuiteSparse:GraphBLAS does not yet employ MPI, this was added at the application level, a development effort that took one week, primarily because of difficulties in resolving a load-balancing issue in the MPI-based parallel algorithm.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Write Quick, Run Fast: Sparse Deep Neural Network in 20 Minutes of Development Time via SuiteSparse:GraphBLAS\",\"authors\":\"T. Davis, M. Aznaveh, Scott P. Kolodziej\",\"doi\":\"10.1109/HPEC.2019.8916550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. Algorithms written in GraphBLAS achieve high performance with minimal development time. Using GraphBLAS, it took a mere 20 minutes to write a first-cut computational kernel that solves the Sparse Deep Neural Network Graph Challenge. Understanding the problem description and file format, writing code to read in the files that define the problem, and comparing our results with the reference solution took a full day. The kernel consists of a single for-loop around 4 lines of code, all of which are calls to GraphBLAS, and it worked perfectly the first time it was compiled. The sequential performance of the GraphBLAS solution is 3x to 5x faster than the MATLAB reference implementation. OpenMP parallelism gives an additional 10x to 15x speedup on a 20-core Intel processor, 17x on an IBM Power8 system, and 20x on a Power9 system, for the largest problems. Since SuiteSparse:GraphBLAS does not yet employ MPI, this was added at the application level, a development effort that took one week, primarily because of difficulties in resolving a load-balancing issue in the MPI-based parallel algorithm.\",\"PeriodicalId\":184253,\"journal\":{\"name\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2019.8916550\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

GraphBLAS是GraphBLAS标准的完整实现，它提供了一个强大而富有表现力的框架，用于创建基于半环上稀疏矩阵运算的优雅数学的图算法。用GraphBLAS编写的算法以最少的开发时间实现高性能。使用GraphBLAS，只花了20分钟就写出了解决稀疏深度神经网络图挑战的第一个计算内核。理解问题描述和文件格式，编写代码以读取定义问题的文件，并将我们的结果与参考解决方案进行比较，这需要一整天的时间。内核由一个大约4行代码的for循环组成，所有这些代码都是对GraphBLAS的调用，它在第一次编译时工作得很好。GraphBLAS解决方案的顺序性能比MATLAB参考实现快3到5倍。对于最大的问题，OpenMP并行性在20核英特尔处理器上提供了10到15倍的额外加速，在IBM Power8系统上提供了17倍的加速，在Power9系统上提供了20倍的加速。由于SuiteSparse:GraphBLAS还没有使用MPI，所以这是在应用程序级别添加的，这一开发工作花费了一周的时间，主要是因为在解决基于MPI的并行算法中的负载平衡问题时遇到了困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Write Quick, Run Fast: Sparse Deep Neural Network in 20 Minutes of Development Time via SuiteSparse:GraphBLAS

SuiteSparse:GraphBLAS is a full implementation of the GraphBLAS standard, which provides a powerful and expressive framework for creating graph algorithms based on the elegant mathematics of sparse matrix operations on a semiring. Algorithms written in GraphBLAS achieve high performance with minimal development time. Using GraphBLAS, it took a mere 20 minutes to write a first-cut computational kernel that solves the Sparse Deep Neural Network Graph Challenge. Understanding the problem description and file format, writing code to read in the files that define the problem, and comparing our results with the reference solution took a full day. The kernel consists of a single for-loop around 4 lines of code, all of which are calls to GraphBLAS, and it worked perfectly the first time it was compiled. The sequential performance of the GraphBLAS solution is 3x to 5x faster than the MATLAB reference implementation. OpenMP parallelism gives an additional 10x to 15x speedup on a 20-core Intel processor, 17x on an IBM Power8 system, and 20x on a Power9 system, for the largest problems. Since SuiteSparse:GraphBLAS does not yet employ MPI, this was added at the application level, a development effort that took one week, primarily because of difficulties in resolving a load-balancing issue in the MPI-based parallel algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量