减少linux框架的开销,以支持gpu异构架构上的短期任务

B. Peterson, H. Dasari, A. Humphrey, J. Sutherland, T. Saad, M. Berzins
{"title":"减少linux框架的开销,以支持gpu异构架构上的短期任务","authors":"B. Peterson, H. Dasari, A. Humphrey, J. Sutherland, T. Saad, M. Berzins","doi":"10.1145/2830018.2830023","DOIUrl":null,"url":null,"abstract":"The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. The Uintah runtime system is based on a distributed directed acyclic graph (DAG) of computational tasks, with a task scheduler that efficiently schedules and execute these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a taskgraph prior to an iteration based on these dependencies, prepares data for tasks, automatically generates MPI message tags, and manages data after task computation. Managing tasks for accelerators pose significant challenges over their CPU task counterparts due to supporting more memory regions, API call latency, memory bandwidth concerns, and the added complexity of development. These challenges are greatest when tasks compute within a few milliseconds, especially those that have stencil based computations that involve halo data, have little reuse of data, and/or require many computational variables. Current and emerging heterogeneous architectures necessitate addressing these challenges within Uintah. This work is not designed to improve performance of existing tasks, but rather reduce runtime overhead to allow developers writing short-lived computational tasks to utilize Uintah in a heterogeneous environment. This work analyzes an initial approach for managing accelerator tasks alongside existing CPU tasks within Uintah. The principal contribution of this work is to identify and address inefficiencies that arise when mapping tasks onto the GPU, to implement new schemes to reduce runtime system overhead, to introduce new features that allow for more tasks to leverage on-node accelerators, and to show overhead reduction results from these improvements.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures\",\"authors\":\"B. Peterson, H. Dasari, A. Humphrey, J. Sutherland, T. Saad, M. Berzins\",\"doi\":\"10.1145/2830018.2830023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. The Uintah runtime system is based on a distributed directed acyclic graph (DAG) of computational tasks, with a task scheduler that efficiently schedules and execute these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a taskgraph prior to an iteration based on these dependencies, prepares data for tasks, automatically generates MPI message tags, and manages data after task computation. Managing tasks for accelerators pose significant challenges over their CPU task counterparts due to supporting more memory regions, API call latency, memory bandwidth concerns, and the added complexity of development. These challenges are greatest when tasks compute within a few milliseconds, especially those that have stencil based computations that involve halo data, have little reuse of data, and/or require many computational variables. Current and emerging heterogeneous architectures necessitate addressing these challenges within Uintah. This work is not designed to improve performance of existing tasks, but rather reduce runtime overhead to allow developers writing short-lived computational tasks to utilize Uintah in a heterogeneous environment. This work analyzes an initial approach for managing accelerator tasks alongside existing CPU tasks within Uintah. The principal contribution of this work is to identify and address inefficiencies that arise when mapping tasks onto the GPU, to implement new schemes to reduce runtime system overhead, to introduce new features that allow for more tasks to leverage on-node accelerators, and to show overhead reduction results from these improvements.\",\"PeriodicalId\":59014,\"journal\":{\"name\":\"高性能计算技术\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"高性能计算技术\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.1145/2830018.2830023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"高性能计算技术","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1145/2830018.2830023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

在现代超级计算机的自适应网格精细化网格上,利用untah计算框架并行求解偏微分方程。ubuntu由一个应用层和一个独立的运行时系统构成。tah运行时系统基于计算任务的分布式有向无环图(DAG),具有任务调度器,可以在CPU内核和节点上加速器上有效地调度和执行这些任务。运行时系统识别任务依赖项,在基于这些依赖项的迭代之前创建任务图,为任务准备数据,自动生成MPI消息标记,并在任务计算后管理数据。由于支持更多的内存区域、API调用延迟、内存带宽问题以及增加的开发复杂性,管理加速器的任务对CPU任务的对应项构成了重大挑战。当任务在几毫秒内进行计算时,这些挑战是最大的,特别是那些具有基于模板的计算,涉及光环数据,数据重用很少,和/或需要许多计算变量的任务。当前和新兴的异构架构需要在untah内部解决这些挑战。这项工作的目的不是提高现有任务的性能,而是减少运行时开销,允许开发人员编写短期的计算任务,以便在异构环境中利用ubuntu。这项工作分析了一种用于管理加速器任务和现有CPU任务的初始方法。这项工作的主要贡献是识别和解决在将任务映射到GPU时出现的低效率问题,实现新方案以减少运行时系统开销,引入允许更多任务利用节点上加速器的新功能,并显示这些改进带来的开销减少结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Reducing overhead in the Uintah framework to support short-lived tasks on GPU-heterogeneous architectures
The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. The Uintah runtime system is based on a distributed directed acyclic graph (DAG) of computational tasks, with a task scheduler that efficiently schedules and execute these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a taskgraph prior to an iteration based on these dependencies, prepares data for tasks, automatically generates MPI message tags, and manages data after task computation. Managing tasks for accelerators pose significant challenges over their CPU task counterparts due to supporting more memory regions, API call latency, memory bandwidth concerns, and the added complexity of development. These challenges are greatest when tasks compute within a few milliseconds, especially those that have stencil based computations that involve halo data, have little reuse of data, and/or require many computational variables. Current and emerging heterogeneous architectures necessitate addressing these challenges within Uintah. This work is not designed to improve performance of existing tasks, but rather reduce runtime overhead to allow developers writing short-lived computational tasks to utilize Uintah in a heterogeneous environment. This work analyzes an initial approach for managing accelerator tasks alongside existing CPU tasks within Uintah. The principal contribution of this work is to identify and address inefficiencies that arise when mapping tasks onto the GPU, to implement new schemes to reduce runtime system overhead, to introduce new features that allow for more tasks to leverage on-node accelerators, and to show overhead reduction results from these improvements.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
1121
期刊最新文献
The AHP-TOPSIS based DSS for selecting suppliers of information resources A mutual one-time password for online application Impact of Artificial Intelligence in COVID-19 Pandemic: A Comprehensive Review Structure and criteria defining business value in agile software development based on hierarchical analysis A Hybrid Collaborative Filtering Technique for Web Service Recommendation using Contextual Attributes of Web Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1