一个多线程架构的设计和性能评估

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI:10.1109/HPCA.1995.386533

R. Govindarajan, S. Nemawarkar, Philip LeNir

{"title":"一个多线程架构的设计和性能评估","authors":"R. Govindarajan, S. Nemawarkar, Philip LeNir","doi":"10.1109/HPCA.1995.386533","DOIUrl":null,"url":null,"abstract":"Multithreaded architectures have the ability to tolerate long memory latencies and unpredictable synchronization delays. We propose a multithreaded architecture that is capable of exploiting both coarse-grain parallelism, and fine-grain instruction level parallelism in a program. Instruction-level parallelism is exploited by grouping instructions from a number of active threads at runtime. The architecture supports multiple resident activations to improve the extent of locality exploited. Further, a distributed data structure cache organization is proposed to reduce both the network: traffic and the latency in accessing remote locations. Initial performance evaluation using discrete-event simulation indicates that the architecture is capable of achieving very high processor throughput. The introduction of the data structure cache reduces the network latency significantly. The impact of various cache organizations on the performance of the architecture is also discussed in this paper.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Design and performance evaluation of a multithreaded architecture\",\"authors\":\"R. Govindarajan, S. Nemawarkar, Philip LeNir\",\"doi\":\"10.1109/HPCA.1995.386533\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multithreaded architectures have the ability to tolerate long memory latencies and unpredictable synchronization delays. We propose a multithreaded architecture that is capable of exploiting both coarse-grain parallelism, and fine-grain instruction level parallelism in a program. Instruction-level parallelism is exploited by grouping instructions from a number of active threads at runtime. The architecture supports multiple resident activations to improve the extent of locality exploited. Further, a distributed data structure cache organization is proposed to reduce both the network: traffic and the latency in accessing remote locations. Initial performance evaluation using discrete-event simulation indicates that the architecture is capable of achieving very high processor throughput. The introduction of the data structure cache reduces the network latency significantly. The impact of various cache organizations on the performance of the architecture is also discussed in this paper.<<ETX>>\",\"PeriodicalId\":330315,\"journal\":{\"name\":\"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.1995.386533\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1995.386533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

多线程体系结构能够容忍较长的内存延迟和不可预测的同步延迟。我们提出了一种多线程架构，能够在程序中同时利用粗粒度并行性和细粒度指令级并行性。指令级并行性是通过在运行时对来自多个活动线程的指令进行分组来实现的。该体系结构支持多个驻留激活，以提高局部性利用的程度。此外，提出了一种分布式数据结构缓存组织，以减少网络流量和访问远程位置的延迟。使用离散事件模拟进行的初步性能评估表明，该体系结构能够实现非常高的处理器吞吐量。数据结构缓存的引入大大降低了网络延迟。本文还讨论了各种缓存组织对体系结构性能的影响

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Design and performance evaluation of a multithreaded architecture

Multithreaded architectures have the ability to tolerate long memory latencies and unpredictable synchronization delays. We propose a multithreaded architecture that is capable of exploiting both coarse-grain parallelism, and fine-grain instruction level parallelism in a program. Instruction-level parallelism is exploited by grouping instructions from a number of active threads at runtime. The architecture supports multiple resident activations to improve the extent of locality exploited. Further, a distributed data structure cache organization is proposed to reduce both the network: traffic and the latency in accessing remote locations. Initial performance evaluation using discrete-event simulation indicates that the architecture is capable of achieving very high processor throughput. The introduction of the data structure cache reduces the network latency significantly. The impact of various cache organizations on the performance of the architecture is also discussed in this paper.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture

自引率

0.00%

发文量

期刊最新文献

Software assistance for data caches Origin-based fault-tolerant routing in the mesh The Named-State Register File: implementation and performance Access ordering and memory-conscious cache utilization Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms