寄存器集大小和结构与代码生成策略对RISC性能的影响

David G. Bradlee, S. Eggers, R. Henry
{"title":"寄存器集大小和结构与代码生成策略对RISC性能的影响","authors":"David G. Bradlee, S. Eggers, R. Henry","doi":"10.1145/115953.115985","DOIUrl":null,"url":null,"abstract":"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"The effect on RISC performance of register set size and structure versus code generation strategy\",\"authors\":\"David G. Bradlee, S. Eggers, R. Henry\",\"doi\":\"10.1145/115953.115985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.\",\"PeriodicalId\":187095,\"journal\":{\"name\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/115953.115985\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/115953.115985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

摘要

本文研究了代码生成策略、寄存器集大小和结构对RISC处理器性能的影响。在拆分和共享组织中,我们将寄存器的数量从16个改变为128个,并使用三种不同的代码生成策略,它们的指令调度器和寄存器分配器在使用寄存器时的合作方式不同。实验中使用的架构结合了摩托罗拉88000和MIPS R2000的特性。我们观察到了三件事。首先,更复杂的代码生成策略需要更少的寄存器。在我们的实验中,超过32个寄存器只产生了边际性能改进。使用更简单的策略,收益递减点出现在64个寄存器之后。其次,给定少量寄存器(例如16),具有共享寄存器组织的机器执行速度比具有拆分组织的机器快;如果寄存器数量较多,那么到共享寄存器集的回写总线就会成为瓶颈,因此拆分组织会更好。第三,如果协处理器不执行昂贵的整数操作,那么具有浮点协处理器的机器并不总是比具有较慢的片上实现的机器执行得快。这个问题可以通过将操作数传输到浮点单元,在那里进行乘法或除法运算,然后将数据传送回CPU来解决。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The effect on RISC performance of register set size and structure versus code generation strategy
This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The effect on RISC performance of register set size and structure versus code generation strategy GT-EP: a novel high-performance real-time architecture Performance prediction and tuning on a multiprocessor High performance interprocessor communication through optical wavelength division multiple access channels An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1