{"title":"Machine Learning for Fine-Grained Hardware Prefetcher Control","authors":"Jason Hiebel, Laura E. Brown, Zhenlin Wang","doi":"10.1145/3337821.3337854","DOIUrl":null,"url":null,"abstract":"Modern architectures provide hardware memory prefetching capabilities which can be configured at runtime. While hardware prefetching can provide substantial performance improvements for many programs, prefetching can also increase contention for shared resources such as last-level cache and memory bandwidth. In turn, this contention can degrade performance in multi-core workloads. In this paper, we model fine-grained hardware prefetcher control as a contextual bandit, and propose a framework for learning prefetcher control policies which adjust hardware prefetching usage at runtime according to workload performance behavior. We train our policies on profiling data, wherein hardware memory prefetchers are enabled or disabled randomly at regular intervals over the course of a workload's execution. The learned prefetcher control policies provide up to a 4.3% average performance improvement over a set of memory bandwidth intensive workloads.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Modern architectures provide hardware memory prefetching capabilities which can be configured at runtime. While hardware prefetching can provide substantial performance improvements for many programs, prefetching can also increase contention for shared resources such as last-level cache and memory bandwidth. In turn, this contention can degrade performance in multi-core workloads. In this paper, we model fine-grained hardware prefetcher control as a contextual bandit, and propose a framework for learning prefetcher control policies which adjust hardware prefetching usage at runtime according to workload performance behavior. We train our policies on profiling data, wherein hardware memory prefetchers are enabled or disabled randomly at regular intervals over the course of a workload's execution. The learned prefetcher control policies provide up to a 4.3% average performance improvement over a set of memory bandwidth intensive workloads.