Arthur Perais, Rami Sheikh, Luke Yen, Michael McIlvaine, R. Clancy
{"title":"Elastic Instruction Fetching","authors":"Arthur Perais, Rami Sheikh, Luke Yen, Michael McIlvaine, R. Clancy","doi":"10.1109/HPCA.2019.00059","DOIUrl":null,"url":null,"abstract":"Branch prediction (i.e., the generation of fetch addresses) and instruction cache accesses need not be tightly coupled. As the instruction fetch stage stalls because of an ICache miss or back-pressure, the branch predictor may run ahead and generate future fetch addresses that can be used for different optimizations, such as instruction prefetching but more importantly hiding taken branch fetch bubbles. This approach is used in many commercially available highperformance design. However, decoupling branch prediction from instruction retrieval has several drawbacks. First, it can increase the pipeline depth, leading to more expensive pipeline flushes. Second, it requires a large Branch Target Buffer (BTB) to store branch targets, allowing the branch predictor to follow taken branches without decoding instruction bytes. Missing the BTB will also cause additional bubbles. In some classes of workloads, those drawbacks may significantly offset the benefits of decoupling. In this paper, we present ELastic Fetching (ELF), a hybrid mechanism that decouples branch prediction from instruction retrieval while minimizing additional bubbles on pipeline flushes and BTB misses. We present two different implementations that trade off complexity for additional performance. Both variants outperform a baseline decoupled fetcher design by up to 3.7% and 5.2%, respectively.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Branch prediction (i.e., the generation of fetch addresses) and instruction cache accesses need not be tightly coupled. As the instruction fetch stage stalls because of an ICache miss or back-pressure, the branch predictor may run ahead and generate future fetch addresses that can be used for different optimizations, such as instruction prefetching but more importantly hiding taken branch fetch bubbles. This approach is used in many commercially available highperformance design. However, decoupling branch prediction from instruction retrieval has several drawbacks. First, it can increase the pipeline depth, leading to more expensive pipeline flushes. Second, it requires a large Branch Target Buffer (BTB) to store branch targets, allowing the branch predictor to follow taken branches without decoding instruction bytes. Missing the BTB will also cause additional bubbles. In some classes of workloads, those drawbacks may significantly offset the benefits of decoupling. In this paper, we present ELastic Fetching (ELF), a hybrid mechanism that decouples branch prediction from instruction retrieval while minimizing additional bubbles on pipeline flushes and BTB misses. We present two different implementations that trade off complexity for additional performance. Both variants outperform a baseline decoupled fetcher design by up to 3.7% and 5.2%, respectively.