一种用于并发多线程处理器的低复杂度、高性能读取单元

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI:10.1109/HPCA.2004.10003

Ayose Falcón, Alex Ramírez, M. Valero

{"title":"一种用于并发多线程处理器的低复杂度、高性能读取单元","authors":"Ayose Falcón, Alex Ramírez, M. Valero","doi":"10.1109/HPCA.2004.10003","DOIUrl":null,"url":null,"abstract":"Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel execution of several threads simultaneously. Fetch performance has been identified as the most important bottleneck for SMT processors. The commonly adopted solution has been fetching from more than one thread each cycle. Recent studies have proposed a plethora of fetch policies to deal with fetch priority among threads, trying to increase fetch performance. We demonstrate that the simultaneous sharing of the fetch unit, apart from increasing the complexity of the fetch unit, can be counterproductive in terms of performance. We evaluate the use of high-performance fetch units in the context of SMT. Our new fetch architecture proposal allows us to feed an 8-way processor fetching from a single thread each cycle, reducing complexity, and increasing the usefulness of proposed fetch policies. Our results show that using new high-performance fetch units, like the FTB or the stream fetch, provides higher performance than fetching from two threads using common SMT fetch architectures. Furthermore, our results show that our design obtains better average performance for any kind of workloads (both ILP and memory bounded benchmarks), in contrast to previously proposed solutions.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A low-complexity, high-performance fetch unit for simultaneous multithreading processors\",\"authors\":\"Ayose Falcón, Alex Ramírez, M. Valero\",\"doi\":\"10.1109/HPCA.2004.10003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel execution of several threads simultaneously. Fetch performance has been identified as the most important bottleneck for SMT processors. The commonly adopted solution has been fetching from more than one thread each cycle. Recent studies have proposed a plethora of fetch policies to deal with fetch priority among threads, trying to increase fetch performance. We demonstrate that the simultaneous sharing of the fetch unit, apart from increasing the complexity of the fetch unit, can be counterproductive in terms of performance. We evaluate the use of high-performance fetch units in the context of SMT. Our new fetch architecture proposal allows us to feed an 8-way processor fetching from a single thread each cycle, reducing complexity, and increasing the usefulness of proposed fetch policies. Our results show that using new high-performance fetch units, like the FTB or the stream fetch, provides higher performance than fetching from two threads using common SMT fetch architectures. Furthermore, our results show that our design obtains better average performance for any kind of workloads (both ILP and memory bounded benchmarks), in contrast to previously proposed solutions.\",\"PeriodicalId\":145009,\"journal\":{\"name\":\"10th International Symposium on High Performance Computer Architecture (HPCA'04)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th International Symposium on High Performance Computer Architecture (HPCA'04)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2004.10003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2004.10003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

同步多线程(SMT)是一种允许同时并行执行多个线程的体系结构技术。获取性能被认为是SMT处理器最重要的瓶颈。通常采用的解决方案是每个周期从多个线程获取。最近的研究提出了大量的读取策略来处理线程之间的读取优先级，试图提高读取性能。我们证明了同时共享获取单元，除了增加获取单元的复杂性之外，在性能方面可能会适得其反。我们评估了在SMT上下文中高性能获取单元的使用。我们的新获取架构建议允许我们每个周期从单个线程提供8路处理器获取，从而降低了复杂性，并增加了所建议的获取策略的实用性。我们的结果表明，使用新的高性能获取单元，如FTB或流获取，比使用普通SMT获取架构从两个线程中获取提供更高的性能。此外，我们的结果表明，与之前提出的解决方案相比，我们的设计在任何类型的工作负载(包括ILP和内存受限基准测试)上都获得了更好的平均性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A low-complexity, high-performance fetch unit for simultaneous multithreading processors

Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel execution of several threads simultaneously. Fetch performance has been identified as the most important bottleneck for SMT processors. The commonly adopted solution has been fetching from more than one thread each cycle. Recent studies have proposed a plethora of fetch policies to deal with fetch priority among threads, trying to increase fetch performance. We demonstrate that the simultaneous sharing of the fetch unit, apart from increasing the complexity of the fetch unit, can be counterproductive in terms of performance. We evaluate the use of high-performance fetch units in the context of SMT. Our new fetch architecture proposal allows us to feed an 8-way processor fetching from a single thread each cycle, reducing complexity, and increasing the usefulness of proposed fetch policies. Our results show that using new high-performance fetch units, like the FTB or the stream fetch, provides higher performance than fetching from two threads using common SMT fetch architectures. Furthermore, our results show that our design obtains better average performance for any kind of workloads (both ILP and memory bounded benchmarks), in contrast to previously proposed solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

10th International Symposium on High Performance Computer Architecture (HPCA'04)

自引率

0.00%

发文量