周期时间评价下的ILP机器性能比较

23rd Annual International Symposium on Computer Architecture (ISCA'96) Pub Date : 1996-05-15 DOI:10.1145/232973.232995

Tetsuya Hara, H. Ando, Chikako Nakanishi, M. Nakaya

{"title":"周期时间评价下的ILP机器性能比较","authors":"Tetsuya Hara, H. Ando, Chikako Nakanishi, M. Nakaya","doi":"10.1145/232973.232995","DOIUrl":null,"url":null,"abstract":"Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Performance Comparison of ILP Machines with Cycle Time Evaluation\",\"authors\":\"Tetsuya Hara, H. Ando, Chikako Nakanishi, M. Nakaya\",\"doi\":\"10.1145/232973.232995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.\",\"PeriodicalId\":415354,\"journal\":{\"name\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1996-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/232973.232995\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.232995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

许多研究通过在特定架构中利用指令级并行性(ILP)来研究性能改进。不幸的是，这些研究表明使用执行程序所需的周期数来提高性能，但没有定量地估计体系结构对周期时间的影响。由于微处理器的性能必须通过其执行时间来衡量，因此需要周期时间评估以及周期计数加速评估。目前，超标量机器被广泛认为是性能最高的机器。另一方面，由于硬件的简单性和指令调度的复杂性，人们认为下一代微处理器将使用VLIW架构实现。然而，一个简单的VLIW机器在推测执行方面有一个严重的弱点。因此，一个简单的VLIW机器是否真的优于一个超标量机器是一个问题。我们最近提出了一种称为predicating的机制，它支持VLIW机器的推测执行，并且在标量机器上显示了显著的周期计数加速。虽然这种机制很简单，但我们不知道它对循环时间的影响有多大，以及性能提高了多少。本文评估了三种ILP机器的循环计数加速和周期时间:超标量机器、简单VLIW机器和带预测的VLIW机器。评估结果表明，简单VLIW机器的性能略优于超标量机器，而带有预测的VLIW机器的速度比超标量机器提高了1.41倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Performance Comparison of ILP Machines with Cycle Time Evaluation

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

23rd Annual International Symposium on Computer Architecture (ISCA'96)

自引率

0.00%

发文量

期刊最新文献

Memory Bandwidth Limitations of Future Microprocessors Missing the Memory Wall: The Case for Processor/Memory Integration Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses STiNG: A CC-NUMA Computer System for the Commercial Marketplace High-Bandwidth Address Translation for Multiple-Issue Processors