Minos计算库:用于极端异构系统的高效并行编程

Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit Pub Date : 2020-02-23 DOI:10.1145/3366428.3380770

R. Gioiosa, B. O. Mutlu, Seyong Lee, J. Vetter, Giulio Picierro, M. Cesati

{"title":"Minos计算库:用于极端异构系统的高效并行编程","authors":"R. Gioiosa, B. O. Mutlu, Seyong Lee, J. Vetter, Giulio Picierro, M. Cesati","doi":"10.1145/3366428.3380770","DOIUrl":null,"url":null,"abstract":"Hardware specialization has become the silver bullet to achieve efficient high performance, from Systems-on-Chip systems, where hardware specialization can be \"extreme\", to large-scale HPC systems. As the complexity of the systems increases, so does the complexity of programming such architectures in a portable way. This work introduces the Minos Computing Library (MCL), as system software, programming model, and programming model runtime that facilitate programming extremely heterogeneous systems. MCL supports the execution of several multi-threaded applications within the same compute node, performs asynchronous execution of application tasks, efficiently balances computation across hardware resources, and provides performance portability. We show that code developed on a personal desktop automatically scales up to fully utilize powerful workstations with 8 GPUs and down to power-efficient embedded systems. MCL provides up to 17.5x speedup over OpenCL on NVIDIA DGX-1 systems and up to 1.88x speedup on single-GPU systems. In multi-application workloads, MCL's dynamic resource allocation provides up to 2.43x performance improvement over manual, static resources allocation.","PeriodicalId":266831,"journal":{"name":"Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems\",\"authors\":\"R. Gioiosa, B. O. Mutlu, Seyong Lee, J. Vetter, Giulio Picierro, M. Cesati\",\"doi\":\"10.1145/3366428.3380770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware specialization has become the silver bullet to achieve efficient high performance, from Systems-on-Chip systems, where hardware specialization can be \\\"extreme\\\", to large-scale HPC systems. As the complexity of the systems increases, so does the complexity of programming such architectures in a portable way. This work introduces the Minos Computing Library (MCL), as system software, programming model, and programming model runtime that facilitate programming extremely heterogeneous systems. MCL supports the execution of several multi-threaded applications within the same compute node, performs asynchronous execution of application tasks, efficiently balances computation across hardware resources, and provides performance portability. We show that code developed on a personal desktop automatically scales up to fully utilize powerful workstations with 8 GPUs and down to power-efficient embedded systems. MCL provides up to 17.5x speedup over OpenCL on NVIDIA DGX-1 systems and up to 1.88x speedup on single-GPU systems. In multi-application workloads, MCL's dynamic resource allocation provides up to 2.43x performance improvement over manual, static resources allocation.\",\"PeriodicalId\":266831,\"journal\":{\"name\":\"Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3366428.3380770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366428.3380770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

硬件专门化已经成为实现高效高性能的灵丹妙药，从硬件专门化可以做到“极致”的片上系统，到大规模高性能计算系统。随着系统复杂性的增加，以可移植的方式对这种体系结构进行编程的复杂性也在增加。本工作介绍了Minos Computing Library (MCL)，作为系统软件、编程模型和编程模型运行时，它促进了对极端异构系统的编程。MCL支持在同一计算节点内执行多个多线程应用程序，执行应用程序任务的异步执行，有效地平衡硬件资源之间的计算，并提供性能可移植性。我们展示了在个人桌面上开发的代码可以自动扩展到充分利用具有8个gpu的强大工作站，并向下扩展到节能的嵌入式系统。MCL在NVIDIA DGX-1系统上提供高达17.5倍的OpenCL加速，在单gpu系统上提供高达1.88倍的加速。在多应用程序工作负载中，MCL的动态资源分配比手动静态资源分配提供了高达2.43倍的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems

Hardware specialization has become the silver bullet to achieve efficient high performance, from Systems-on-Chip systems, where hardware specialization can be "extreme", to large-scale HPC systems. As the complexity of the systems increases, so does the complexity of programming such architectures in a portable way. This work introduces the Minos Computing Library (MCL), as system software, programming model, and programming model runtime that facilitate programming extremely heterogeneous systems. MCL supports the execution of several multi-threaded applications within the same compute node, performs asynchronous execution of application tasks, efficiently balances computation across hardware resources, and provides performance portability. We show that code developed on a personal desktop automatically scales up to fully utilize powerful workstations with 8 GPUs and down to power-efficient embedded systems. MCL provides up to 17.5x speedup over OpenCL on NVIDIA DGX-1 systems and up to 1.88x speedup on single-GPU systems. In multi-application workloads, MCL's dynamic resource allocation provides up to 2.43x performance improvement over manual, static resources allocation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit

自引率

0.00%

发文量