{"title":"寄存器集大小和结构与代码生成策略对RISC性能的影响","authors":"David G. Bradlee, S. Eggers, R. Henry","doi":"10.1145/115953.115985","DOIUrl":null,"url":null,"abstract":"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"The effect on RISC performance of register set size and structure versus code generation strategy\",\"authors\":\"David G. Bradlee, S. Eggers, R. Henry\",\"doi\":\"10.1145/115953.115985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.\",\"PeriodicalId\":187095,\"journal\":{\"name\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/115953.115985\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/115953.115985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The effect on RISC performance of register set size and structure versus code generation strategy
This paper examines the effect of code generation strategy and register set size and structure on the performance of RISC processors. We vary the number of registers from 16 to 128, in both split and shared organizations, and use three different code generation strategies that differ in the way their instruction schedulers and register allocators cooperate in utilizing registers. The architectnres used in the experiments incorporate fealures of the Motorola 88000 and the MIPS R2000. We observed three things. First, more sophisticated code generation strategies require fewer registers. In our experiments more than 32 registers yielded only marginal performance improvement over 32. Using a simpler strategy, the point of diminishing returns appeared after 64 registers. Second, given a small number of registers (e.g. 16), a machine with a shared register organization executes faster than one with a split organization; given a larger number of registers, the write-back bus to the shared register set becomes the bottleneck, and a split organization is better. Third, a machine with a floating point coprocessor does not always execute faster than one with a slower on-chip implementation, if the coprocessor does not perform expensive integer operations as well. The problem can be solved by transferring operands to the floating point unit, doing a multiply or divide there, and then shipping the data back to the CPU.