{"title":"Enhancing Data Reuse in Cache Contention Aware Thread Scheduling on GPGPU","authors":"Chin-Fu Lu, Hsien-Kai Kuo, B. Lai","doi":"10.1109/CISIS.2016.132","DOIUrl":null,"url":null,"abstract":"GPGPUs have been widely adopted as throughput processing platforms for modern big-data and cloud computing. Attaining a high performance design on a GPGPU requires careful tradeoffs among various design concerns. Data reuse, cache contention, and thread level parallelism, have been demonstrated as three imperative performance factors for a GPGPU. The correlated performance impacts of these factors pose non-trivial concerns when scheduling threads on GPGPUs. This paper proposes a three-staged scheduling scheme to coschedule the threads with consideration of the three factors. The experiment results on a set of irregular parallel applications, when compared with previous approaches, have demonstrated up to 70% execution time improvement.","PeriodicalId":249236,"journal":{"name":"2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2016.132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GPGPUs have been widely adopted as throughput processing platforms for modern big-data and cloud computing. Attaining a high performance design on a GPGPU requires careful tradeoffs among various design concerns. Data reuse, cache contention, and thread level parallelism, have been demonstrated as three imperative performance factors for a GPGPU. The correlated performance impacts of these factors pose non-trivial concerns when scheduling threads on GPGPUs. This paper proposes a three-staged scheduling scheme to coschedule the threads with consideration of the three factors. The experiment results on a set of irregular parallel applications, when compared with previous approaches, have demonstrated up to 70% execution time improvement.