{"title":"Accelerating decoupled look-ahead via weak dependence removal: A metaheuristic approach","authors":"Raj Parihar, Michael C. Huang","doi":"10.1109/HPCA.2014.6835974","DOIUrl":null,"url":null,"abstract":"Despite the proliferation of multi-core and multi-threaded architectures, exploiting implicit parallelism for a single semantic thread is still a crucial component in achieving high performance. Look-ahead is a tried-and-true strategy in uncovering implicit parallelism, but a conventional, monolithic out-of-order core quickly becomes resource-inefficient when looking beyond a small distance. A more decoupled approach with an independent, dedicated look-ahead thread on a separate thread context can be a more flexible and effective implementation, especially in a multi-core environment. While capable of generating significant performance gains, the look-ahead agent often becomes the new speed limit. Fortunately, the look-ahead thread has no hard correctness constraints and presents new opportunities for optimizations. One such opportunity is to exploit “weak” dependences. Intuitively, not all dependences are equal. Some links in a dependence chain are weak enough that removing them in the look-ahead thread does not materially affect the quality of look-ahead but improves the speed. While there are some common patterns of weak dependences, they can not be generalized as heuristics in generating better code for the look-ahead thread. A primary reason is that removing a false weak dependence can be exceedingly costly. Nevertheless, a trial-and-error approach can reliably identify opportunities for improving the look-ahead thread and quantify the benefits. A framework based on genetic algorithm can help search for the right set of changes to the look-ahead thread. In the set of applications where the speed of look-ahead has become the new limit, this method is found to improve the overall system performance by up to 1.48x with a geometric mean of 1.14x over the baseline decoupled look-ahead system, while reducing energy consumption by 11%.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Despite the proliferation of multi-core and multi-threaded architectures, exploiting implicit parallelism for a single semantic thread is still a crucial component in achieving high performance. Look-ahead is a tried-and-true strategy in uncovering implicit parallelism, but a conventional, monolithic out-of-order core quickly becomes resource-inefficient when looking beyond a small distance. A more decoupled approach with an independent, dedicated look-ahead thread on a separate thread context can be a more flexible and effective implementation, especially in a multi-core environment. While capable of generating significant performance gains, the look-ahead agent often becomes the new speed limit. Fortunately, the look-ahead thread has no hard correctness constraints and presents new opportunities for optimizations. One such opportunity is to exploit “weak” dependences. Intuitively, not all dependences are equal. Some links in a dependence chain are weak enough that removing them in the look-ahead thread does not materially affect the quality of look-ahead but improves the speed. While there are some common patterns of weak dependences, they can not be generalized as heuristics in generating better code for the look-ahead thread. A primary reason is that removing a false weak dependence can be exceedingly costly. Nevertheless, a trial-and-error approach can reliably identify opportunities for improving the look-ahead thread and quantify the benefits. A framework based on genetic algorithm can help search for the right set of changes to the look-ahead thread. In the set of applications where the speed of look-ahead has become the new limit, this method is found to improve the overall system performance by up to 1.48x with a geometric mean of 1.14x over the baseline decoupled look-ahead system, while reducing energy consumption by 11%.