Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney
{"title":"Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths","authors":"Erkan Diken, M. O'Riordan, Roel Jordans, L. Józwiak, H. Corporaal, D. Moloney","doi":"10.1109/ASAP.2015.7245732","DOIUrl":null,"url":null,"abstract":"The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"9 1","pages":"181-188"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2015.7245732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有多个本机矢量宽度的VLIW体系结构的混合长度SIMD代码生成
DLP在应用中的并行度并不是固定的,而是根据应用的不同计算特性而变化的。相反,今天的大多数处理器都包括单宽度SIMD(矢量)硬件来利用DLP。然而,单宽度SIMD架构可能不是服务具有不同DLP的应用程序的最佳选择,而且它们可能导致性能和能源效率低下。我们建议使用具有多个原生矢量宽度的VLIW处理器,以更好地服务于具有变化DLP的应用程序。SHAVE就是这种VLIW处理器的一个例子,它为本机32位和128位宽矢量操作提供硬件支持。本文研究并实现了面向剃须处理器的混合长度SIMD代码生成支持。更具体地说,我们的目标是为剃须处理器的本机32位和128位宽矢量单元生成32位和128/64位SIMD代码。通过这种方式,我们通过减少开销操作的数量和增加SIMD硬件利用率来提高编译器生成的SIMD代码的性能。实验结果表明,我们在编译器中实现的方法将综合基准测试的性能提高了47%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the Conference Chairs - ASAP 2020 Message from the ASAP 2016 chairs An IEEE 754 double-precision floating-point multiplier for denormalized and normalized floating-point numbers Application-set driven exploration for custom processor architectures Stochastic circuit design and performance evaluation of vector quantization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1