{"title":"作为无序执行的低复杂度替代方案的解耦架构","authors":"N. Crago, Sanjay J. Patel","doi":"10.1109/PACT.2011.28","DOIUrl":null,"url":null,"abstract":"In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memory-consuming streams that can be executed concurrently, which we call strands. We identify loss-of-decoupling events which cripple performance on traditional decoupled architectures, and design OUTRIDERHP to enable extraction of multiple strands and control speculation which provide superior memory and functional unit latency tolerance. OUTRIDERHP outperforms a baseline in-order architecture by 26-220% and Decoupled Access/Execute by 7-172% when executing parallel benchmarks on an 8-core CMP configuration. OUTRIDERHP performs within 15% of higher-complexity out-of-order cores despite not utilizing large physical register files, dynamic scheduling, and register renaming hardware.","PeriodicalId":106423,"journal":{"name":"2011 International Conference on Parallel Architectures and Compilation Techniques","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution\",\"authors\":\"N. Crago, Sanjay J. Patel\",\"doi\":\"10.1109/PACT.2011.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memory-consuming streams that can be executed concurrently, which we call strands. We identify loss-of-decoupling events which cripple performance on traditional decoupled architectures, and design OUTRIDERHP to enable extraction of multiple strands and control speculation which provide superior memory and functional unit latency tolerance. OUTRIDERHP outperforms a baseline in-order architecture by 26-220% and Decoupled Access/Execute by 7-172% when executing parallel benchmarks on an 8-core CMP configuration. OUTRIDERHP performs within 15% of higher-complexity out-of-order cores despite not utilizing large physical register files, dynamic scheduling, and register renaming hardware.\",\"PeriodicalId\":106423,\"journal\":{\"name\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Parallel Architectures and Compilation Techniques\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2011.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2011.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution
In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memory-consuming streams that can be executed concurrently, which we call strands. We identify loss-of-decoupling events which cripple performance on traditional decoupled architectures, and design OUTRIDERHP to enable extraction of multiple strands and control speculation which provide superior memory and functional unit latency tolerance. OUTRIDERHP outperforms a baseline in-order architecture by 26-220% and Decoupled Access/Execute by 7-172% when executing parallel benchmarks on an 8-core CMP configuration. OUTRIDERHP performs within 15% of higher-complexity out-of-order cores despite not utilizing large physical register files, dynamic scheduling, and register renaming hardware.