优化空间计算的内存访问

International Symposium on Code Generation and Optimization, 2003. CGO 2003. Pub Date : 2003-03-23 DOI:10.1109/CGO.2003.1191547

M. Budiu, S. Goldstein

{"title":"优化空间计算的内存访问","authors":"M. Budiu, S. Goldstein","doi":"10.1109/CGO.2003.1191547","DOIUrl":null,"url":null,"abstract":"We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.","PeriodicalId":277590,"journal":{"name":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Optimizing memory accesses for spatial computation\",\"authors\":\"M. Budiu, S. Goldstein\",\"doi\":\"10.1109/CGO.2003.1191547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.\",\"PeriodicalId\":277590,\"journal\":{\"name\":\"International Symposium on Code Generation and Optimization, 2003. CGO 2003.\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Code Generation and Optimization, 2003. CGO 2003.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO.2003.1191547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization, 2003. CGO 2003.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO.2003.1191547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

我们给出了CASH编译器用于改进基于指针的程序的内存并行性的内部表示和优化。CASH使用基于ssa的内存表示，它简洁地总结了控制流和依赖信息。在CASH中，内存优化是一个四步过程:(1)首先建立一个初始的、相对粗糙的内存依赖表示;(2)接下来，使用依赖性测试去除不必要的内存依赖性;(3)第三，删除冗余的内存操作(4)最后，通过循环管道内存访问来增加并行性。虽然上面的前三个步骤非常通用，但是循环流水线转换特别适用于空间计算，这是CASH的主要目标。提出的冗余内存删除优化包括:负载/存储提升(包括部分冗余消除和公共子表达式消除)，存储后负载移除，存储前存储移除(死存储移除)和循环不变负载运动。我们的循环流水线转换之一是循环并行化的一种新形式，称为循环解耦。这种转换将循环体中的独立内存访问分离为几个独立的循环，这些循环允许动态地相对于彼此滑动。一个新的计算原语，一个令牌生成器用于动态控制滑动的数量，允许最大的自由度，同时保证不违反内存依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing memory accesses for spatial computation

We present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-based representation for memory, which compactly summarizes both control-flow- and dependence information. In CASH, memory optimization is a four-step process: (1) first an initial, relatively coarse representation of memory dependences is built; (2) next, unnecessary memory dependences are removed using dependence tests; (3) third, redundant memory operations are removed (4) finally, parallelism is increased by pipelining memory accesses in loops. While the first three steps above are very general, the loop pipelining transformations are particularly applicable for spatial computation, which is the primary target of CASH. The redundant memory removal optimizations presented are: load/store hoisting (subsuming partial redundancy elimination and common-subexpression elimination), load-after-store removal, store-before-store removal (dead store removal) and loop-invariant load motion. One of our loop pipelining transformations is a new form of loop parallelization, called loop decoupling. This transformation separates independent memory accesses within a loop body into several independent loops, which are allowed dynamically to slip with respect to each other A new computational primitive, a token generator is used to dynamically control the amount of slip, allowing maximum freedom, while guaranteeing that no memory dependences are violated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Code Generation and Optimization, 2003. CGO 2003.

自引率

0.00%

发文量