Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2016-06-18 DOI:10.1145/3007787.3001144

Divya Mahajan, Amir Yazdanbaksh, Jongse Park, Bradley Thwaites, H. Esmaeilzadeh

{"title":"Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration","authors":"Divya Mahajan, Amir Yazdanbaksh, Jongse Park, Bradley Thwaites, H. Esmaeilzadeh","doi":"10.1145/3007787.3001144","DOIUrl":null,"url":null,"abstract":"Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to the accelerator-improving performance and efficiency-or run on the precise core-maintaining quality. In this paper we introduce MITHRA, a co-designed hardware-software solution, that navigates these tradeoffs to deliver high performance and efficiency while lowering the final quality loss. MITHRA seeks to identify whether each individual accelerator invocation will lead to an undesirable quality loss and, if so, directs the processor to run the original precise code. This identification is cast as a binary classification task that requires a cohesive co-design of hardware and software. The hardware component performs the classification at runtime and exposes a knob to the software mechanism to control quality tradeoffs. The software tunes this knob by solving a statistical optimization problem that maximizes benefits from approximation while providing statistical guarantees that final quality level will be met with high confidence. The software uses this knob to tune and train the hardware classifiers. We devise two distinct hardware classifiers, one table-based and one neural network based. To understand the efficacy of these mechanisms, we compare them with an ideal, but infeasible design, the oracle. Results show that, with 95% confidence the table-based design can restrict the final output quality loss to 5% for 90% of unseen input sets while providing 2.5× speedup and 2.6× energy efficiency. The neural design shows similar speedup however, improves the efficiency by 13%. Compared to the table-based design, the oracle improves speedup by 26% and efficiency by 36%. These results show that MITHRA performs within a close range of the oracle and can effectively navigate the quality tradeoffs in approximate acceleration.","PeriodicalId":6634,"journal":{"name":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","volume":"75 1","pages":"66-77"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3007787.3001144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to the accelerator-improving performance and efficiency-or run on the precise core-maintaining quality. In this paper we introduce MITHRA, a co-designed hardware-software solution, that navigates these tradeoffs to deliver high performance and efficiency while lowering the final quality loss. MITHRA seeks to identify whether each individual accelerator invocation will lead to an undesirable quality loss and, if so, directs the processor to run the original precise code. This identification is cast as a binary classification task that requires a cohesive co-design of hardware and software. The hardware component performs the classification at runtime and exposes a knob to the software mechanism to control quality tradeoffs. The software tunes this knob by solving a statistical optimization problem that maximizes benefits from approximation while providing statistical guarantees that final quality level will be met with high confidence. The software uses this knob to tune and train the hardware classifiers. We devise two distinct hardware classifiers, one table-based and one neural network based. To understand the efficacy of these mechanisms, we compare them with an ideal, but infeasible design, the oracle. Results show that, with 95% confidence the table-based design can restrict the final output quality loss to 5% for 90% of unseen input sets while providing 2.5× speedup and 2.6× energy efficiency. The neural design shows similar speedup however, improves the efficiency by 13%. Compared to the table-based design, the oracle improves speedup by 26% and efficiency by 36%. These results show that MITHRA performs within a close range of the oracle and can effectively navigate the quality tradeoffs in approximate acceleration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

近似加速度质量权衡控制中的统计保证

通常，一个近似的加速器替换对频繁执行的代码区域的每次调用，而不考虑最终的质量下降。但是，存在很大的决策空间，其中每个调用可以委托给加速器(提高性能和效率)，也可以在精确的核心上运行(保持质量)。在本文中，我们介绍了MITHRA，一个共同设计的硬件软件解决方案，导航这些权衡提供高性能和效率，同时降低最终的质量损失。MITHRA试图确定每个单独的加速器调用是否会导致不希望的质量损失，如果是，则指示处理器运行原始的精确代码。这种识别是一种二元分类任务，需要硬件和软件的内聚协同设计。硬件组件在运行时执行分类，并向软件机制公开一个旋钮，以控制质量权衡。该软件通过解决统计优化问题来调整这个旋钮，该问题最大限度地提高了近似值的好处，同时提供了统计保证，最终的质量水平将以高可信度得到满足。软件使用这个旋钮来调整和训练硬件分类器。我们设计了两个不同的硬件分类器，一个基于表，一个基于神经网络。为了理解这些机制的功效，我们将它们与一个理想的、但不可行的设计——神谕进行比较。结果表明，在95%的置信度下，基于表格的设计可以将90%的未见输入集的最终输出质量损失限制在5%，同时提供2.5倍的加速和2.6倍的能源效率。神经网络设计显示出类似的加速，但效率提高了13%。与基于表的设计相比，oracle的加速提高了26%，效率提高了36%。这些结果表明，MITHRA在接近oracle的范围内执行，并且可以有效地在近似加速度下进行质量权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

RelaxFault Memory Repair Boosting Access Parallelism to PCM-Based Main Memory Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems Energy Efficient Architecture for Graph Analytics Accelerators