内部矢量SIMD指令的核心专业化

2009 IEEE International Conference on Computer Design Pub Date : 2009-10-04 DOI:10.1109/ICCD.2009.5413112

C. Meenderinck, B. Juurlink

{"title":"内部矢量SIMD指令的核心专业化","authors":"C. Meenderinck, B. Juurlink","doi":"10.1109/ICCD.2009.5413112","DOIUrl":null,"url":null,"abstract":"Current research is mainly focussing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper we propose application specific intra-vector instructions for two dimensional signal processing kernels. In such kernels usually significant data rearrangement overhead is required in order to use the SIMD capabilities. When using the intra-vector instructions the overhead can be avoided. We have implemented intra-vector instructions in the Cell SPU core and measured speedups of up to 2.06, with an average of 1.45.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Intra-vector SIMD instructions for core specialization\",\"authors\":\"C. Meenderinck, B. Juurlink\",\"doi\":\"10.1109/ICCD.2009.5413112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current research is mainly focussing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper we propose application specific intra-vector instructions for two dimensional signal processing kernels. In such kernels usually significant data rearrangement overhead is required in order to use the SIMD capabilities. When using the intra-vector instructions the overhead can be avoided. We have implemented intra-vector instructions in the Cell SPU core and measured speedups of up to 2.06, with an average of 1.45.\",\"PeriodicalId\":256908,\"journal\":{\"name\":\"2009 IEEE International Conference on Computer Design\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2009.5413112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2009.5413112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

目前的研究主要集中在利用张力腿来提高性能。然而，实现性能可伸缩性的另一个途径是专门化。在本文中，我们提出了用于二维信号处理内核的特定应用的向量内指令。在这样的内核中，为了使用SIMD功能，通常需要大量的数据重排开销。当使用intra-vector指令时，可以避免这种开销。我们已经在Cell SPU内核中实现了向量内指令，并测量了高达2.06的速度，平均速度为1.45。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Intra-vector SIMD instructions for core specialization

Current research is mainly focussing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper we propose application specific intra-vector instructions for two dimensional signal processing kernels. In such kernels usually significant data rearrangement overhead is required in order to use the SIMD capabilities. When using the intra-vector instructions the overhead can be avoided. We have implemented intra-vector instructions in the Cell SPU core and measured speedups of up to 2.06, with an average of 1.45.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Conference on Computer Design

自引率

0.00%

发文量

期刊最新文献

Empirical performance models for 3T1D memories A novel SoC architecture on FPGA for ultra fast face detection A Technology-Agnostic Simulation Environment (TASE) for iterative custom IC design across processes Low-overhead error detection for Networks-on-Chip Interconnect performance corners considering crosstalk noise