Jetson Nano平台的OpenMP卸载

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3547276.3548517

Ilias K. Kasmeridis, V. Dimakopoulos

{"title":"Jetson Nano平台的OpenMP卸载","authors":"Ilias K. Kasmeridis, V. Dimakopoulos","doi":"10.1145/3547276.3548517","DOIUrl":null,"url":null,"abstract":"The nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support computationally demanding applications such as AI inference, deep learning and computer vision. While the Jetson Nano family supports a number of apis and libraries out of the box, comprehensive support of OpenMP, one of the most popular apis, is not readily available. In this work we present the implementation of an OpenMP infrastructure that is able to harness both the cpu and the gpu of a Jetson Nano board using the offload facilities of the recent versions of the OpenMP specifications. We discuss the compiler-side transformations of key constructs, the generation of cuda-based code as well as how the runtime support is provided. We also provide experimental results for a number of applications, exhibiting performance comparable with their pure cuda versions.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OpenMP Offloading in the Jetson Nano Platform\",\"authors\":\"Ilias K. Kasmeridis, V. Dimakopoulos\",\"doi\":\"10.1145/3547276.3548517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support computationally demanding applications such as AI inference, deep learning and computer vision. While the Jetson Nano family supports a number of apis and libraries out of the box, comprehensive support of OpenMP, one of the most popular apis, is not readily available. In this work we present the implementation of an OpenMP infrastructure that is able to harness both the cpu and the gpu of a Jetson Nano board using the offload facilities of the recent versions of the OpenMP specifications. We discuss the compiler-side transformations of key constructs, the generation of cuda-based code as well as how the runtime support is provided. We also provide experimental results for a number of applications, exhibiting performance comparable with their pure cuda versions.\",\"PeriodicalId\":255540,\"journal\":{\"name\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3547276.3548517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

nvidia Jetson Nano是一个非常受欢迎的系统模块和开发工具包，它在一个小而节能的嵌入式平台上带来了高性能的规格。它集成了128核gpu和四核cpu，提供了足够的能力来支持人工智能推理、深度学习和计算机视觉等计算要求很高的应用程序。虽然Jetson Nano系列支持许多开箱即用的api和库，但对最流行的api之一OpenMP的全面支持并不容易获得。在这项工作中，我们提出了一个OpenMP基础设施的实现，该基础设施能够利用Jetson Nano板的cpu和gpu，使用最新版本的OpenMP规范的卸载设施。我们将讨论关键结构的编译器端转换、基于cuda的代码的生成以及如何提供运行时支持。我们还提供了一些应用程序的实验结果，显示出与纯cuda版本相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OpenMP Offloading in the Jetson Nano Platform

The nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support computationally demanding applications such as AI inference, deep learning and computer vision. While the Jetson Nano family supports a number of apis and libraries out of the box, comprehensive support of OpenMP, one of the most popular apis, is not readily available. In this work we present the implementation of an OpenMP infrastructure that is able to harness both the cpu and the gpu of a Jetson Nano board using the offload facilities of the recent versions of the OpenMP specifications. We discuss the compiler-side transformations of key constructs, the generation of cuda-based code as well as how the runtime support is provided. We also provide experimental results for a number of applications, exhibiting performance comparable with their pure cuda versions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA A Fast and Secure AKA Protocol for B5G Execution Flow Aware Profiling for ROS-based Autonomous Vehicle Software A User-Based Bike Return Algorithm for Docked Bike Sharing Systems Extracting High Definition Map Information from Aerial Images