利用选择:一个可实现的同步多线程处理器上的指令获取和下发

23rd Annual International Symposium on Computer Architecture (ISCA'96) Pub Date : 1996-05-15 DOI:10.1145/232973.232993

D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, R. Stamm

{"title":"利用选择:一个可实现的同步多线程处理器上的指令获取和下发","authors":"D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, R. Stamm","doi":"10.1145/232973.232993","DOIUrl":null,"url":null,"abstract":"Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the \"best\" instructions to the processor.","PeriodicalId":415354,"journal":{"name":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"848","resultStr":"{\"title\":\"Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor\",\"authors\":\"D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, R. Stamm\",\"doi\":\"10.1145/232973.232993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the \\\"best\\\" instructions to the processor.\",\"PeriodicalId\":415354,\"journal\":{\"name\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1996-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"848\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"23rd Annual International Symposium on Computer Architecture (ISCA'96)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/232973.232993\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"23rd Annual International Symposium on Computer Architecture (ISCA'96)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/232973.232993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 848

摘要

同时多线程是一种允许多个独立线程在每个周期发出多条指令的技术。在之前的工作中，我们基于某种理想化的模型展示了同步多线程的性能潜力。在本文中，我们证明了同步多线程的吞吐量增益可以在不对传统的宽问题超标量进行大量更改的情况下实现，无论是在硬件结构还是大小上。我们提出了一个同时多线程的体系结构，它实现了三个目标:(1)它最大限度地减少了对传统超标量设计的体系结构影响，(2)它对单独执行的单个线程的性能影响最小，(3)它在运行多个线程时实现了显著的吞吐量增益。我们的同步多线程架构实现了每个周期5.4条指令的吞吐量，比具有类似硬件资源的未修改超标量提高了2.5倍。这种加速得到了以前在其他体系结构中未被利用的多线程优势的增强:能够在每个周期使用处理器时最有效地支持提取和发出那些线程，从而为处理器提供“最佳”指令。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

23rd Annual International Symposium on Computer Architecture (ISCA'96)

自引率

0.00%

发文量

期刊最新文献

Memory Bandwidth Limitations of Future Microprocessors Missing the Memory Wall: The Case for Processor/Memory Integration Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses STiNG: A CC-NUMA Computer System for the Commercial Marketplace High-Bandwidth Address Translation for Multiple-Issue Processors