{"title":"Recycling Data Slack in Out-of-Order Cores","authors":"Gokul Subramanian Ravi, Mikko H. Lipasti","doi":"10.1109/HPCA.2019.00065","DOIUrl":null,"url":null,"abstract":"In order to operate reliably and produce expected outputs, modern processors set timing margins conservatively at design time to support extreme variations in workload and environment, imposing a high cost in performance and energy efficiency. The relentless pressure to improve execution bandwidth has exacerbated this problem, requiring instructions with increasingly diverse semantics, leading to datapaths with a large gap between best-case and worst-case timing. In practice, data slack, the unutilized portion of the clock period due to inactive critical paths in a circuit, can often be as high as half of the clock period. In this paper we propose ReDSOC, which dynamically identifies data slack and aggressively recycles it, to improve performance on Out-Of-Order (OOO) cores. It is implemented via a transparent-flow based data bypass network between the execution units of the core. Further, ReDSOC performs slackaware OOO instruction scheduling aided by optimizations to the wakeup and select logic, to support this aggressive operation execution mechanism. ReDSOC is implemented atop OOO cores of different sizes and tested on a variety of general purpose and machine learning applications. The implementation achieves average speedups in the range of 5% to 25% across the different cores and application categories. Further, it is shown to be more efficient at improving performance in comparison to prior proposals. Keywords-clock cycle slack; out-of-order; scheduler; transparent dataflow;","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In order to operate reliably and produce expected outputs, modern processors set timing margins conservatively at design time to support extreme variations in workload and environment, imposing a high cost in performance and energy efficiency. The relentless pressure to improve execution bandwidth has exacerbated this problem, requiring instructions with increasingly diverse semantics, leading to datapaths with a large gap between best-case and worst-case timing. In practice, data slack, the unutilized portion of the clock period due to inactive critical paths in a circuit, can often be as high as half of the clock period. In this paper we propose ReDSOC, which dynamically identifies data slack and aggressively recycles it, to improve performance on Out-Of-Order (OOO) cores. It is implemented via a transparent-flow based data bypass network between the execution units of the core. Further, ReDSOC performs slackaware OOO instruction scheduling aided by optimizations to the wakeup and select logic, to support this aggressive operation execution mechanism. ReDSOC is implemented atop OOO cores of different sizes and tested on a variety of general purpose and machine learning applications. The implementation achieves average speedups in the range of 5% to 25% across the different cores and application categories. Further, it is shown to be more efficient at improving performance in comparison to prior proposals. Keywords-clock cycle slack; out-of-order; scheduler; transparent dataflow;