NAS并行基准内核与Python:性能和编程工作分析的重点是gpu

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00013

D. D. Domenico, G. H. Cavalheiro, J. F. Lima

{"title":"NAS并行基准内核与Python:性能和编程工作分析的重点是gpu","authors":"D. D. Domenico, G. H. Cavalheiro, J. F. Lima","doi":"10.1109/pdp55904.2022.00013","DOIUrl":null,"url":null,"abstract":"GPU devices are currently seen as one of the trending topics for parallel computing. Commonly, GPU applications are developed with programming tools based on compiled languages, like C/C++ and Fortran. This paper presents a performance and programming effort analysis employing the Python high-level language to implement the NAS Parallel Benchmark kernels targeting GPUs. We used Numba environment to enable CUDA support in Python, a tool that allows us to implement a GPU application with pure Python code. Our experimental results showed that Python applications reached a performance similar to C++ programs employing CUDA and better than C++ using OpenACC for most NPB kernels. Furthermore, Python codes required less operations related to the GPU framework than CUDA, mainly because Python needs a lower number of statements to manage memory allocations and data transfers. However, our Python versions demanded more operations than OpenACC implementations.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"NAS Parallel Benchmark Kernels with Python: A performance and programming effort analysis focusing on GPUs\",\"authors\":\"D. D. Domenico, G. H. Cavalheiro, J. F. Lima\",\"doi\":\"10.1109/pdp55904.2022.00013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPU devices are currently seen as one of the trending topics for parallel computing. Commonly, GPU applications are developed with programming tools based on compiled languages, like C/C++ and Fortran. This paper presents a performance and programming effort analysis employing the Python high-level language to implement the NAS Parallel Benchmark kernels targeting GPUs. We used Numba environment to enable CUDA support in Python, a tool that allows us to implement a GPU application with pure Python code. Our experimental results showed that Python applications reached a performance similar to C++ programs employing CUDA and better than C++ using OpenACC for most NPB kernels. Furthermore, Python codes required less operations related to the GPU framework than CUDA, mainly because Python needs a lower number of statements to manage memory allocations and data transfers. However, our Python versions demanded more operations than OpenACC implementations.\",\"PeriodicalId\":210759,\"journal\":{\"name\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/pdp55904.2022.00013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

GPU设备目前被视为并行计算的热门话题之一。通常，GPU应用程序是使用基于编译语言(如C/ c++和Fortran)的编程工具开发的。本文介绍了采用Python高级语言实现以gpu为目标的NAS并行基准内核的性能和编程工作量分析。我们使用Numba环境在Python中启用CUDA支持，该工具允许我们使用纯Python代码实现GPU应用程序。我们的实验结果表明，Python应用程序在大多数NPB内核上达到了与使用CUDA的c++程序相似的性能，并且优于使用OpenACC的c++程序。此外，Python代码比CUDA需要更少的与GPU框架相关的操作，主要是因为Python需要更少的语句来管理内存分配和数据传输。然而，我们的Python版本需要比OpenACC实现更多的操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NAS Parallel Benchmark Kernels with Python: A performance and programming effort analysis focusing on GPUs

GPU devices are currently seen as one of the trending topics for parallel computing. Commonly, GPU applications are developed with programming tools based on compiled languages, like C/C++ and Fortran. This paper presents a performance and programming effort analysis employing the Python high-level language to implement the NAS Parallel Benchmark kernels targeting GPUs. We used Numba environment to enable CUDA support in Python, a tool that allows us to implement a GPU application with pure Python code. Our experimental results showed that Python applications reached a performance similar to C++ programs employing CUDA and better than C++ using OpenACC for most NPB kernels. Furthermore, Python codes required less operations related to the GPU framework than CUDA, mainly because Python needs a lower number of statements to manage memory allocations and data transfers. However, our Python versions demanded more operations than OpenACC implementations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量

期刊最新文献

Some Experiments on High Performance Anomaly Detection Advancing Database System Operators with Near-Data Processing A Parallel Approximation Algorithm for the Steiner Forest Problem NoaSci: A Numerical Object Array Library for I/O of Scientific Applications on Object Storage Load Balancing of the Parallel Execution of Two Dimensional Partitioned Cellular Automata