3D Workload Subsetting for GPU Architecture Pathfinding

2015 IEEE International Symposium on Workload Characterization Pub Date : 2015-10-04 DOI:10.1109/IISWC.2015.24

V. George

{"title":"3D Workload Subsetting for GPU Architecture Pathfinding","authors":"V. George","doi":"10.1109/IISWC.2015.24","DOIUrl":null,"url":null,"abstract":"Growth of high-end 3D gaming, expansion of gaming to new devices like tablets and phones, and evolution of multiple Graphics APIs like Direct3D 10+, and OpenGL 3.0+ have led to an explosion in the number of workloads that need to be evaluated for GPU architecture path-finding. To decide on the optimal architecture configuration, the workloads need to be simulated on a wide range of architecture designs which incurs huge cost, both in terms of time and resources. In order to reduce the simulation cost of path-finding, extracting workload subsets from 3D workloads is essential. This paper presents a methodology to find representative workload subsets from 3D workloads by combining clustering and phase detection. In the first part, this paper presents a methodology to group draw-calls based on performance similarity by clustering on their micro architecture independent characteristics. Across 717 frames encompassing 828K draw-calls, the clustering solution obtained an average performance prediction error per frame of 1.0% at an average clustering efficiency of 65.8%. The clustering quality is additionally evaluated by calculating cluster outliers, which are clusters with intra cluster prediction error greater than 20%. The clustering quality, measured using cluster outliers, is an indication of the performance similarity of the individual clusters. Across the spectrum of frames, we found that on an average only 3.0% of the clusters are outliers which indicates a high clustering quality. In order to detect repetitive behavior in 3D workloads, we propose characterization of frame intervals using shader vectors and then using shader vector equality to extract the repeating patterns. We show that phases exist in each game in the Bio shock series enabling extraction of small representative subsets from the workloads. Performance improvement of the workload subsets, which are less than one percent of parent workload, with GPU frequency scaling has high correlation (correlation coefficient=99.7%+) to the performance improvement of its parent workload.","PeriodicalId":142698,"journal":{"name":"2015 IEEE International Symposium on Workload Characterization","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2015.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Growth of high-end 3D gaming, expansion of gaming to new devices like tablets and phones, and evolution of multiple Graphics APIs like Direct3D 10+, and OpenGL 3.0+ have led to an explosion in the number of workloads that need to be evaluated for GPU architecture path-finding. To decide on the optimal architecture configuration, the workloads need to be simulated on a wide range of architecture designs which incurs huge cost, both in terms of time and resources. In order to reduce the simulation cost of path-finding, extracting workload subsets from 3D workloads is essential. This paper presents a methodology to find representative workload subsets from 3D workloads by combining clustering and phase detection. In the first part, this paper presents a methodology to group draw-calls based on performance similarity by clustering on their micro architecture independent characteristics. Across 717 frames encompassing 828K draw-calls, the clustering solution obtained an average performance prediction error per frame of 1.0% at an average clustering efficiency of 65.8%. The clustering quality is additionally evaluated by calculating cluster outliers, which are clusters with intra cluster prediction error greater than 20%. The clustering quality, measured using cluster outliers, is an indication of the performance similarity of the individual clusters. Across the spectrum of frames, we found that on an average only 3.0% of the clusters are outliers which indicates a high clustering quality. In order to detect repetitive behavior in 3D workloads, we propose characterization of frame intervals using shader vectors and then using shader vector equality to extract the repeating patterns. We show that phases exist in each game in the Bio shock series enabling extraction of small representative subsets from the workloads. Performance improvement of the workload subsets, which are less than one percent of parent workload, with GPU frequency scaling has high correlation (correlation coefficient=99.7%+) to the performance improvement of its parent workload.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPU架构寻路的3D工作负载子集

高端3D游戏的发展，游戏向平板电脑和手机等新设备的扩展，以及Direct3D 10+和OpenGL 3.0+等多种图形api的发展，导致需要评估GPU架构寻径的工作负载数量激增。为了确定最佳的体系结构配置，需要在广泛的体系结构设计上模拟工作负载，这在时间和资源方面都会产生巨大的成本。为了降低寻路的仿真成本，从三维工作负载中提取工作负载子集是至关重要的。本文提出了一种结合聚类和相位检测的方法，从三维工作负载中寻找具有代表性的工作负载子集。在第一部分中，本文提出了一种基于性能相似性对绘制调用进行分组的方法，该方法通过对绘制调用的微体系结构独立特征进行聚类。在包含828K绘制调用的717帧中，聚类解决方案每帧的平均性能预测误差为1.0%，平均聚类效率为65.8%。另外，通过计算聚类异常值来评估聚类质量，这些异常值是指聚类内预测误差大于20%的聚类。使用聚类异常值测量的聚类质量是单个聚类的性能相似性的指示。在整个帧谱中，我们发现平均只有3.0%的集群是异常值，这表明集群质量很高。为了检测3D工作负载中的重复行为，我们提出使用着色器矢量对帧间隔进行表征，然后使用着色器矢量相等来提取重复模式。我们证明了《生化奇兵》系列中的每个游戏都存在阶段，从而能够从工作负载中提取出具有代表性的小子集。GPU频率缩放的工作负载子集(不到父工作负载的1%)的性能提升与其父工作负载的性能提升具有很高的相关性(相关系数=99.7%+)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量

期刊最新文献

Fast Computational GPU Design with GT-Pin On Power-Performance Characterization of Concurrent Throughput Kernels CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores Exploring Parallel Programming Models for Heterogeneous Computing Systems Revealing Critical Loads and Hidden Data Locality in GPGPU Applications