A Data Layout Transformation for Vectorizing Compilers

Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing Pub Date : 2018-02-24 DOI:10.1145/3178433.3178440

Arsène Pérard-Gayot, Richard Membarth, P. Slusallek, Simon Moll, Roland Leißa, Sebastian Hack

{"title":"A Data Layout Transformation for Vectorizing Compilers","authors":"Arsène Pérard-Gayot, Richard Membarth, P. Slusallek, Simon Moll, Roland Leißa, Sebastian Hack","doi":"10.1145/3178433.3178440","DOIUrl":null,"url":null,"abstract":"Modern processors are often equipped with vector instruction sets. Such instructions operate on multiple elements of data at once, and greatly improve performance for specific applications. A programmer has two options to take advantage of these instructions: writing manually vectorized code, or using an auto-vectorizing compiler. In the latter case, he only has to place annotations to instruct the auto-vectorizing compiler to vectorize a particular piece of code. Thanks to auto-vectorization, the source program remains portable, and the programmer can focus on the task at hand instead of the low-level details of intrinsics programming. However, the performance of the vectorized program strongly depends on the precision of the analyses performed by the vectorizing compiler. In this paper, we improve the precision of these analyses by selectively splitting stack-allocated variables of a structure or aggregate type. Without this optimization, automatic vectorization slows the execution down compared to the scalar, non-vectorized code. When this optimization is enabled, we show that the vectorized code can be as fast as hand-optimized, manually vectorized implementations.","PeriodicalId":197479,"journal":{"name":"Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing","volume":"12 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3178433.3178440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Modern processors are often equipped with vector instruction sets. Such instructions operate on multiple elements of data at once, and greatly improve performance for specific applications. A programmer has two options to take advantage of these instructions: writing manually vectorized code, or using an auto-vectorizing compiler. In the latter case, he only has to place annotations to instruct the auto-vectorizing compiler to vectorize a particular piece of code. Thanks to auto-vectorization, the source program remains portable, and the programmer can focus on the task at hand instead of the low-level details of intrinsics programming. However, the performance of the vectorized program strongly depends on the precision of the analyses performed by the vectorizing compiler. In this paper, we improve the precision of these analyses by selectively splitting stack-allocated variables of a structure or aggregate type. Without this optimization, automatic vectorization slows the execution down compared to the scalar, non-vectorized code. When this optimization is enabled, we show that the vectorized code can be as fast as hand-optimized, manually vectorized implementations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向向量化编译器的数据布局转换

现代处理器通常配备矢量指令集。这样的指令一次操作多个数据元素，并极大地提高了特定应用程序的性能。程序员有两种选择来利用这些指令:手动编写向量化代码，或者使用自动向量化编译器。在后一种情况下，他只需要放置注释来指示自动向量化编译器向量化一段特定的代码。多亏了自动向量化，源程序保持了可移植性，程序员可以专注于手头的任务，而不是内在编程的底层细节。然而，向量化程序的性能在很大程度上取决于向量化编译器所执行的分析的精度。在本文中，我们通过选择性地拆分结构或聚合类型的堆栈分配变量来提高这些分析的精度。如果没有这种优化，与标量、非向量化代码相比，自动向量化会减慢执行速度。当这种优化被启用时，我们展示了向量化代码可以和手工优化、手动向量化实现一样快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing

自引率

0.00%

发文量

期刊最新文献

Investigating automatic vectorization for real-time 3D scene understanding Vectorization of a spectral finite-element numerical kernel Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard Usuba: Optimizing & Trustworthy Bitslicing Compiler