{"title":"3D Workload Subsetting for GPU Architecture Pathfinding","authors":"V. George","doi":"10.1109/IISWC.2015.24","DOIUrl":null,"url":null,"abstract":"Growth of high-end 3D gaming, expansion of gaming to new devices like tablets and phones, and evolution of multiple Graphics APIs like Direct3D 10+, and OpenGL 3.0+ have led to an explosion in the number of workloads that need to be evaluated for GPU architecture path-finding. To decide on the optimal architecture configuration, the workloads need to be simulated on a wide range of architecture designs which incurs huge cost, both in terms of time and resources. In order to reduce the simulation cost of path-finding, extracting workload subsets from 3D workloads is essential. This paper presents a methodology to find representative workload subsets from 3D workloads by combining clustering and phase detection. In the first part, this paper presents a methodology to group draw-calls based on performance similarity by clustering on their micro architecture independent characteristics. Across 717 frames encompassing 828K draw-calls, the clustering solution obtained an average performance prediction error per frame of 1.0% at an average clustering efficiency of 65.8%. The clustering quality is additionally evaluated by calculating cluster outliers, which are clusters with intra cluster prediction error greater than 20%. The clustering quality, measured using cluster outliers, is an indication of the performance similarity of the individual clusters. Across the spectrum of frames, we found that on an average only 3.0% of the clusters are outliers which indicates a high clustering quality. In order to detect repetitive behavior in 3D workloads, we propose characterization of frame intervals using shader vectors and then using shader vector equality to extract the repeating patterns. We show that phases exist in each game in the Bio shock series enabling extraction of small representative subsets from the workloads. Performance improvement of the workload subsets, which are less than one percent of parent workload, with GPU frequency scaling has high correlation (correlation coefficient=99.7%+) to the performance improvement of its parent workload.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2015.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Growth of high-end 3D gaming, expansion of gaming to new devices like tablets and phones, and evolution of multiple Graphics APIs like Direct3D 10+, and OpenGL 3.0+ have led to an explosion in the number of workloads that need to be evaluated for GPU architecture path-finding. To decide on the optimal architecture configuration, the workloads need to be simulated on a wide range of architecture designs which incurs huge cost, both in terms of time and resources. In order to reduce the simulation cost of path-finding, extracting workload subsets from 3D workloads is essential. This paper presents a methodology to find representative workload subsets from 3D workloads by combining clustering and phase detection. In the first part, this paper presents a methodology to group draw-calls based on performance similarity by clustering on their micro architecture independent characteristics. Across 717 frames encompassing 828K draw-calls, the clustering solution obtained an average performance prediction error per frame of 1.0% at an average clustering efficiency of 65.8%. The clustering quality is additionally evaluated by calculating cluster outliers, which are clusters with intra cluster prediction error greater than 20%. The clustering quality, measured using cluster outliers, is an indication of the performance similarity of the individual clusters. Across the spectrum of frames, we found that on an average only 3.0% of the clusters are outliers which indicates a high clustering quality. In order to detect repetitive behavior in 3D workloads, we propose characterization of frame intervals using shader vectors and then using shader vector equality to extract the repeating patterns. We show that phases exist in each game in the Bio shock series enabling extraction of small representative subsets from the workloads. Performance improvement of the workload subsets, which are less than one percent of parent workload, with GPU frequency scaling has high correlation (correlation coefficient=99.7%+) to the performance improvement of its parent workload.