Extension VM: Interleaved Data Layout in Vector Memory

IF 1.5 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Architecture and Code Optimization Pub Date : 2023-11-07 DOI:10.1145/3631528

Dunbo Zhang, Qingjie Lang, Ruoxi Wang, Li Shen

{"title":"Extension VM: Interleaved Data Layout in Vector Memory","authors":"Dunbo Zhang, Qingjie Lang, Ruoxi Wang, Li Shen","doi":"10.1145/3631528","DOIUrl":null,"url":null,"abstract":"While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates from the unsuitable mapping of multidimensional data structures to two-dimensional vector memory spaces. In addition, the traditional data layout mapping method creates an irreconcilable conflict between row- and column-major accesses. Ideally, both row- and column-major accesses can take advantage of the bank parallelism of vector memory. To this end, we propose the Interleaved Data Layout (IDL) method in vector memory, which can distribute vector elements into different banks regardless of whether they are in the row- or column major category, so that any vector memory access can benefit from bank parallelism. Additionally, we propose an Extension Vector Memory (EVM) architecture to achieve IDL in vector memory. EVM can support two data layout methods and vector memory access modes simultaneously. The key idea is to continuously distribute the data that needs to be accessed from the main memory to different banks during the loading period. Thus, EVM can provide a larger spatial locality level through careful programming and the extension ISA support. The experimental results showed a 1.43-fold improvement of state-of-the-art vector processors by the proposed architecture, with an area cost of only 1.73%. Furthermore, the energy consumption was reduced by 50.1%.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"79 2","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631528","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates from the unsuitable mapping of multidimensional data structures to two-dimensional vector memory spaces. In addition, the traditional data layout mapping method creates an irreconcilable conflict between row- and column-major accesses. Ideally, both row- and column-major accesses can take advantage of the bank parallelism of vector memory. To this end, we propose the Interleaved Data Layout (IDL) method in vector memory, which can distribute vector elements into different banks regardless of whether they are in the row- or column major category, so that any vector memory access can benefit from bank parallelism. Additionally, we propose an Extension Vector Memory (EVM) architecture to achieve IDL in vector memory. EVM can support two data layout methods and vector memory access modes simultaneously. The key idea is to continuously distribute the data that needs to be accessed from the main memory to different banks during the loading period. Thus, EVM can provide a larger spatial locality level through careful programming and the extension ISA support. The experimental results showed a 1.43-fold improvement of state-of-the-art vector processors by the proposed architecture, with an area cost of only 1.73%. Furthermore, the energy consumption was reduced by 50.1%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

扩展虚拟机:交错的数据布局在矢量存储器

虽然矢量架构被广泛应用于神经网络、信号处理和高性能计算的处理器;然而，它的性能受到低效的列主内存访问的限制。列主访问限制源于多维数据结构到二维矢量存储空间的不适当映射。此外，传统的数据布局映射方法在行主访问和列主访问之间产生了不可调和的冲突。理想情况下，行主访问和列主访问都可以利用向量存储器的组并行性。为此，我们提出了向量存储器中的交错数据布局(IDL)方法，该方法可以将向量元素分布到不同的bank中，而不管它们是在行或列主类别中，从而使任何向量存储器访问都可以受益于bank并行性。此外，我们提出了一种扩展向量存储器(EVM)架构来实现向量存储器中的IDL。EVM可以同时支持两种数据布局方式和矢量存储器访问方式。其关键思想是在加载期间连续地将需要从主存访问的数据分发到不同的银行。因此，通过仔细的编程和扩展ISA支持，EVM可以提供更大的空间局部性级别。实验结果表明，该架构比最先进的矢量处理器提高了1.43倍，而面积成本仅为1.73%。能耗降低50.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.