Peruse and Profit: Estimating the Accelerability of Loops

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926269

Snehasish Kumar, V. Srinivasan, A. Sharifian, Nick Sumner, Arrvindh Shriraman

{"title":"Peruse and Profit: Estimating the Accelerability of Loops","authors":"Snehasish Kumar, V. Srinivasan, A. Sharifian, Nick Sumner, Arrvindh Shriraman","doi":"10.1145/2925426.2926269","DOIUrl":null,"url":null,"abstract":"There exist a multitude of execution models available today for a developer to target. The choices vary from general purpose processors to fixed-function hardware accelerators with a large number of variations in-between. There is a growing demand to assess the potential benefits of porting or rewriting an application to a target architecture in order to fully exploit the benefits of performance and/or energy efficiency offered by such targets. However, as a first step of this process, it is necessary to determine whether the application has characteristics suitable for acceleration. In this paper, we present Peruse, a tool to characterize the features of loops in an application and to help the programmer understand the amenability of loops for acceleration. We consider a diverse set of features ranging from loop characteristics (e.g., loop exit points) and operation mixes (e.g., control vs data operations) to wider code region characteristics (e.g., idempotency, vectorizability). Peruse is language, architecture, and input independent and uses the intermediate representation of compilers to do the characterization. Using static analyses makes Peruse scalable and enables analysis of large applications to identify and extract interesting loops suitable for acceleration. We show analysis results for unmodified applications from the SPEC CPU benchmark suite, Polybench, and HPC workloads. For an end-user it is more desirable to get an estimate of the potential speedup due to acceleration. We use the workload characterization results of Peruse as features and develop a machine-learning based model to predict the potential speedup of a loop when off-loaded to a fixed function hardware accelerator. We use the model to predict the speedup of loops selected by Peruse and achieve an accuracy of 79%.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

There exist a multitude of execution models available today for a developer to target. The choices vary from general purpose processors to fixed-function hardware accelerators with a large number of variations in-between. There is a growing demand to assess the potential benefits of porting or rewriting an application to a target architecture in order to fully exploit the benefits of performance and/or energy efficiency offered by such targets. However, as a first step of this process, it is necessary to determine whether the application has characteristics suitable for acceleration. In this paper, we present Peruse, a tool to characterize the features of loops in an application and to help the programmer understand the amenability of loops for acceleration. We consider a diverse set of features ranging from loop characteristics (e.g., loop exit points) and operation mixes (e.g., control vs data operations) to wider code region characteristics (e.g., idempotency, vectorizability). Peruse is language, architecture, and input independent and uses the intermediate representation of compilers to do the characterization. Using static analyses makes Peruse scalable and enables analysis of large applications to identify and extract interesting loops suitable for acceleration. We show analysis results for unmodified applications from the SPEC CPU benchmark suite, Polybench, and HPC workloads. For an end-user it is more desirable to get an estimate of the potential speedup due to acceleration. We use the workload characterization results of Peruse as features and develop a machine-learning based model to predict the potential speedup of a loop when off-loaded to a fixed function hardware accelerator. We use the model to predict the speedup of loops selected by Peruse and achieve an accuracy of 79%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

阅读和利润:估计循环的加速性

目前有许多可供开发人员选择的执行模型。选择从通用处理器到固定功能的硬件加速器，其间有大量的变化。越来越多的人需要评估将应用程序移植或重写到目标体系结构的潜在好处，以便充分利用这些目标提供的性能和/或能源效率的好处。然而，作为这个过程的第一步，有必要确定应用程序是否具有适合加速的特性。在本文中，我们介绍了Peruse，这是一个描述应用程序中循环特征的工具，并帮助程序员理解循环对加速的适应性。我们考虑了从循环特征(例如，循环出口点)和操作混合(例如，控制与数据操作)到更广泛的代码区域特征(例如，幂等性，向量化)的各种特征集。Peruse独立于语言、体系结构和输入，并使用编译器的中间表示来进行表征。使用静态分析使Peruse具有可扩展性，并允许对大型应用程序进行分析，以识别和提取适合加速的有趣循环。我们展示了来自SPEC CPU基准套件、Polybench和HPC工作负载的未修改应用程序的分析结果。对于最终用户来说，更希望得到由于加速而产生的潜在加速的估计。我们使用Peruse的工作负载表征结果作为特征，并开发了一个基于机器学习的模型来预测卸载到固定功能硬件加速器时环路的潜在加速。我们使用该模型对Peruse选择的循环进行加速预测，准确率达到79%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量

期刊最新文献

Prefetching Techniques for Near-memory Throughput Processors Polly-ACC Transparent compilation to heterogeneous hardware Galaxyfly: A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales Interconnection Networks Parallel Transposition of Sparse Data Structures Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics