{"title":"A Novel Set of Directives for Multi-device Programming with OpenMP","authors":"Raul Torres, R. Ferrer, Xavier Teruel","doi":"10.1109/IPDPSW55747.2022.00075","DOIUrl":null,"url":null,"abstract":"The latest versions of OpenMP have been offering support for offloading execution to the accelerator devices present in a variety of heterogeneous architectures via the target directives. However, these directives can only refer to one device at a time, which makes multi-device programming an explicit and tedious task. In this work, we present an extension of OpenMP in the form of a new set of directives (target spread directives) which offers direct support for multiple devices and allows the distribution of data and/or workload among them without explicit programming. This results in an additional level of parallelism between the host and the devices. The target spread directives were evaluated using the Somier micro-app in a PowerPC cluster node with up to four Nvidia Tesla V100 GPUs. The results showed a speedup of approximately 2X using four GPUs and the new directive set, in comparison with the baseline implementation which used one GPU and the existing target directive set.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The latest versions of OpenMP have been offering support for offloading execution to the accelerator devices present in a variety of heterogeneous architectures via the target directives. However, these directives can only refer to one device at a time, which makes multi-device programming an explicit and tedious task. In this work, we present an extension of OpenMP in the form of a new set of directives (target spread directives) which offers direct support for multiple devices and allows the distribution of data and/or workload among them without explicit programming. This results in an additional level of parallelism between the host and the devices. The target spread directives were evaluated using the Somier micro-app in a PowerPC cluster node with up to four Nvidia Tesla V100 GPUs. The results showed a speedup of approximately 2X using four GPUs and the new directive set, in comparison with the baseline implementation which used one GPU and the existing target directive set.
OpenMP的最新版本已经提供了通过目标指令将执行卸载到各种异构体系结构中的加速器设备上的支持。然而,这些指令一次只能引用一个设备,这使得多设备编程成为一项明确而乏味的任务。在这项工作中,我们以一组新的指令(目标传播指令)的形式提出了OpenMP的扩展,它提供了对多个设备的直接支持,并允许在它们之间分发数据和/或工作负载,而无需显式编程。这在主机和设备之间产生了额外的并行性。使用PowerPC集群节点中的Somier微应用程序评估目标传播指令,该节点最多有四个Nvidia Tesla V100 gpu。结果显示,与使用一个GPU和现有目标指令集的基线实现相比,使用四个GPU和新指令集的速度提高了大约2倍。