Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

23rd Annual International Symposium on Computer Architecture (ISCA'96) Pub Date : 1996-05-15 DOI:10.1145/232973.232993

D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, R. Stamm

{"title":"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor","authors":"D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, R. Stamm","doi":"10.1145/232973.232993","DOIUrl":null,"url":null,"abstract":"Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the \"best\" instructions to the processor.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"848","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.232993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 848

Abstract

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用选择:一个可实现的同步多线程处理器上的指令获取和下发

同时多线程是一种允许多个独立线程在每个周期发出多条指令的技术。在之前的工作中，我们基于某种理想化的模型展示了同步多线程的性能潜力。在本文中，我们证明了同步多线程的吞吐量增益可以在不对传统的宽问题超标量进行大量更改的情况下实现，无论是在硬件结构还是大小上。我们提出了一个同时多线程的体系结构，它实现了三个目标:(1)它最大限度地减少了对传统超标量设计的体系结构影响，(2)它对单独执行的单个线程的性能影响最小，(3)它在运行多个线程时实现了显著的吞吐量增益。我们的同步多线程架构实现了每个周期5.4条指令的吞吐量，比具有类似硬件资源的未修改超标量提高了2.5倍。这种加速得到了以前在其他体系结构中未被利用的多线程优势的增强:能够在每个周期使用处理器时最有效地支持提取和发出那些线程，从而为处理器提供“最佳”指令。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊