Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2015-07-27 DOI:10.1109/ASAP.2015.7245732

Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney

{"title":"Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths","authors":"Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney","doi":"10.1109/ASAP.2015.7245732","DOIUrl":null,"url":null,"abstract":"The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"9 1","pages":"181-188"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2015.7245732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有多个本机矢量宽度的VLIW体系结构的混合长度SIMD代码生成

DLP在应用中的并行度并不是固定的，而是根据应用的不同计算特性而变化的。相反，今天的大多数处理器都包括单宽度SIMD(矢量)硬件来利用DLP。然而，单宽度SIMD架构可能不是服务具有不同DLP的应用程序的最佳选择，而且它们可能导致性能和能源效率低下。我们建议使用具有多个原生矢量宽度的VLIW处理器，以更好地服务于具有变化DLP的应用程序。SHAVE就是这种VLIW处理器的一个例子，它为本机32位和128位宽矢量操作提供硬件支持。本文研究并实现了面向剃须处理器的混合长度SIMD代码生成支持。更具体地说，我们的目标是为剃须处理器的本机32位和128位宽矢量单元生成32位和128/64位SIMD代码。通过这种方式，我们通过减少开销操作的数量和增加SIMD硬件利用率来提高编译器生成的SIMD代码的性能。实验结果表明，我们在编译器中实现的方法将综合基准测试的性能提高了47%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量

期刊最新文献

Message from the Conference Chairs - ASAP 2020 Message from the ASAP 2016 chairs An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers Application-set driven exploration for custom processor architectures Stochastic circuit design and performance evaluation of vector quantization