寄存器集大小和结构与代码生成策略对RISC性能的影响

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI:10.1145/115953.115985

David G. Bradlee, S. Eggers, R. Henry

{"title":"寄存器集大小和结构与代码生成策略对RISC性能的影响","authors":"David G. Bradlee, S. Eggers, R. Henry","doi":"10.1145/115953.115985","DOIUrl":null,"url":null,"abstract":"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"The effect on RISC performance of register set size and structure versus code generation strategy\",\"authors\":\"David G. Bradlee, S. Eggers, R. Henry\",\"doi\":\"10.1145/115953.115985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.\",\"PeriodicalId\":187095,\"journal\":{\"name\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/115953.115985\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/115953.115985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

本文研究了代码生成策略、寄存器集大小和结构对RISC处理器性能的影响。在拆分和共享组织中，我们将寄存器的数量从16个改变为128个，并使用三种不同的代码生成策略，它们的指令调度器和寄存器分配器在使用寄存器时的合作方式不同。实验中使用的架构结合了摩托罗拉88000和MIPS R2000的特性。我们观察到了三件事。首先，更复杂的代码生成策略需要更少的寄存器。在我们的实验中，超过32个寄存器只产生了边际性能改进。使用更简单的策略，收益递减点出现在64个寄存器之后。其次，给定少量寄存器(例如16)，具有共享寄存器组织的机器执行速度比具有拆分组织的机器快;如果寄存器数量较多，那么到共享寄存器集的回写总线就会成为瓶颈，因此拆分组织会更好。第三，如果协处理器不执行昂贵的整数操作，那么具有浮点协处理器的机器并不总是比具有较慢的片上实现的机器执行得快。这个问题可以通过将操作数传输到浮点单元，在那里进行乘法或除法运算，然后将数据传送回CPU来解决。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The effect on RISC performance of register set size and structure versus code generation strategy

This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量