处理器自定义对工作负载的输入数据集有多敏感?

2011 IEEE 9th Symposium on Application Specific Processors (SASP) Pub Date : 2011-06-05 DOI:10.1109/SASP.2011.5941070

Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout

{"title":"处理器自定义对工作负载的输入数据集有多敏感?","authors":"Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout","doi":"10.1109/SASP.2011.5941070","DOIUrl":null,"url":null,"abstract":"Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.","PeriodicalId":375788,"journal":{"name":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"How sensitive is processor customization to the workload's input datasets?\",\"authors\":\"Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout\",\"doi\":\"10.1109/SASP.2011.5941070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.\",\"PeriodicalId\":375788,\"journal\":{\"name\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"volume\":\"7 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SASP.2011.5941070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2011.5941070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

硬件定制是在满足应用程序性能需求的同时实现高能效的有效方法。特定于应用程序的处理器通过针对特定工作负载(即感兴趣的应用程序或应用程序领域)定制设计来实现低能耗的高性能。到目前为止，仍有一个基本问题没有得到解答，那就是处理器定制对训练工作量的输入数据集的敏感程度。目前的做法是在处理器设计周期中考虑每个工作负载的单个或仅几个输入数据集——原因是模拟非常耗时，不包括考虑大量数据集。本文首次解决了这个基本问题。为了在合理的时间内执行解决这个问题所需的大量运行，我们首先提出了一个基于第一性原理的机制分析模型，该模型在广泛的设计空间中平均精度在3.6%以内。分析模型比设计空间探索的详细周期精确仿真至少快4个数量级。利用该模型，我们能够研究工作负载输入数据集在最佳定制处理器架构上的敏感性。考虑到MiBench基准测试和每个基准测试1000个数据集，我们得出结论，处理器定制在很大程度上对数据集不敏感。这在实践中有一个重要的含义:在设计特定于应用程序的处理器时，单个或仅几个数据集就足以确定最佳的处理器架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

How sensitive is processor customization to the workload's input datasets?

Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 9th Symposium on Application Specific Processors (SASP)

自引率

0.00%

发文量