{"title":"POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs","authors":"Yashuai Lü, Libo Huang, Li Shen","doi":"10.1109/PACT.2017.22","DOIUrl":null,"url":null,"abstract":"Work-queue is an effective approach for mapping irregular-parallel workloads to GPGPUs. It can improve the utilization of SIMD units by only processing useful works which are dynamically generated during execution. As current GPGPUs lack necessary supports for work-queues, a software-based work-queue implementation often suffers from memory contention and load balancing issues. We present a novel hardware work-queue design named DaQueue, which incorporates data-aware features to improve the efficiency of work-queues on GPGPUs. We evaluate our proposal on irregular-parallel workloads with a cycle-level simulator. Experimental results show that the DaQueue significantly improves the performance over software-based implementation for these workloads. Compared with an idealized hardware worklist approach which is the state-of-the-art prior work, the DaQueue can achieve an average of 29.54% extra speedup.","PeriodicalId":438103,"journal":{"name":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"433 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2017.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Work-queue is an effective approach for mapping irregular-parallel workloads to GPGPUs. It can improve the utilization of SIMD units by only processing useful works which are dynamically generated during execution. As current GPGPUs lack necessary supports for work-queues, a software-based work-queue implementation often suffers from memory contention and load balancing issues. We present a novel hardware work-queue design named DaQueue, which incorporates data-aware features to improve the efficiency of work-queues on GPGPUs. We evaluate our proposal on irregular-parallel workloads with a cycle-level simulator. Experimental results show that the DaQueue significantly improves the performance over software-based implementation for these workloads. Compared with an idealized hardware worklist approach which is the state-of-the-art prior work, the DaQueue can achieve an average of 29.54% extra speedup.