ELF:通过协调的warp和fetch调度最大化gpu的内存级并行性

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2015-11-15 DOI:10.1145/2807591.2807598

Jason Jong Kyu Park, Yongjun Park, S. Mahlke

{"title":"ELF:通过协调的warp和fetch调度最大化gpu的内存级并行性","authors":"Jason Jong Kyu Park, Yongjun Park, S. Mahlke","doi":"10.1145/2807591.2807598","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) are increasingly utilized as throughput engines in the modern computer systems. GPUs rely on fast context switching between thousands of threads to hide long latency operations, however, they still stall due to the memory operations. To minimize the stalls, memory operations should be overlapped with other operations as much as possible to maximize memory-level parallelism (MLP). In this paper, we propose Earliest Load First (ELF) warp scheduling, which maximizes the MLP by giving higher priority to the warps that have the fewest instructions to the next memory load. ELF utilizes the same warp priority for the fetch scheduling so that both are coordinated. We also show that ELF reveals its full benefits when there are fewer memory conflicts and fetch stalls. Evaluations show that ELF can improve the performance by 4.1% and achieve total improvement of 11.9% when used with other techniques over commonly-used greedy-then-oldest scheduling.","PeriodicalId":117494,"journal":{"name":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"180 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling\",\"authors\":\"Jason Jong Kyu Park, Yongjun Park, S. Mahlke\",\"doi\":\"10.1145/2807591.2807598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processing units (GPUs) are increasingly utilized as throughput engines in the modern computer systems. GPUs rely on fast context switching between thousands of threads to hide long latency operations, however, they still stall due to the memory operations. To minimize the stalls, memory operations should be overlapped with other operations as much as possible to maximize memory-level parallelism (MLP). In this paper, we propose Earliest Load First (ELF) warp scheduling, which maximizes the MLP by giving higher priority to the warps that have the fewest instructions to the next memory load. ELF utilizes the same warp priority for the fetch scheduling so that both are coordinated. We also show that ELF reveals its full benefits when there are fewer memory conflicts and fetch stalls. Evaluations show that ELF can improve the performance by 4.1% and achieve total improvement of 11.9% when used with other techniques over commonly-used greedy-then-oldest scheduling.\",\"PeriodicalId\":117494,\"journal\":{\"name\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"180 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2807591.2807598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807591.2807598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

图形处理单元(gpu)在现代计算机系统中越来越多地用作吞吐量引擎。gpu依赖于数千个线程之间的快速上下文切换来隐藏长延迟操作，然而，由于内存操作，它们仍然会停机。为了最小化延迟，内存操作应该尽可能地与其他操作重叠，以最大化内存级并行性(MLP)。在本文中，我们提出了最早加载优先(ELF)的warp调度，它通过给指令最少的warp更高的优先级来最大化MLP。ELF对读取调度使用相同的翘曲优先级，因此两者是协调的。我们还展示了ELF在内存冲突和获取延迟减少的情况下的全部优势。评估表明，与常用的“先贪后老”调度相比，ELF可以将性能提高4.1%，与其他技术一起使用时，可以实现11.9%的总改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling

Graphics processing units (GPUs) are increasingly utilized as throughput engines in the modern computer systems. GPUs rely on fast context switching between thousands of threads to hide long latency operations, however, they still stall due to the memory operations. To minimize the stalls, memory operations should be overlapped with other operations as much as possible to maximize memory-level parallelism (MLP). In this paper, we propose Earliest Load First (ELF) warp scheduling, which maximizes the MLP by giving higher priority to the warps that have the fewest instructions to the next memory load. ELF utilizes the same warp priority for the fetch scheduling so that both are coordinated. We also show that ELF reveals its full benefits when there are fewer memory conflicts and fetch stalls. Evaluations show that ELF can improve the performance by 4.1% and achieve total improvement of 11.9% when used with other techniques over commonly-used greedy-then-oldest scheduling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量