TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Computer Architecture Letters Pub Date : 2023-06-26 DOI:10.1109/LCA.2023.3289317

Reoma Matsuo;Toru Koizumi;Hidetsugu Irie;Shuichi Sakai;Ryota Shioya

引用次数: 0

Abstract

A graphics processing unit (GPU) is a processor that achieves high throughput by exploiting data parallelism. We found that many GPU workloads also contain instruction-level parallelism that can be extracted through out-of-order execution to provide additional performance improvement opportunities. We propose the TURBULENCE architecture for very low-cost out-of-order execution on GPUs. TURBULENCE consists of a novel ISA that introduces the concept of referencing operands by inter-instruction distance instead of register numbers, and a novel microarchitecture that executes the novel ISA. This distance-based operand has the property of not causing false dependencies. By exploiting this property, we achieve cost-effective out-of-order execution on GPUs without introducing expensive hardware such as a rename logic and a load-store queue. Simulation results show that TURBULENCE improves performance by 17.6% without increasing energy consumption over an existing GPU.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TURBULENCE：利用基于距离的 ISA 在 GPU 上进行具有完备性的无序执行

图形处理器（GPU）是一种通过利用数据并行性实现高吞吐量的处理器。我们发现，许多 GPU 工作负载也包含指令级并行性，可以通过失序执行来提取指令级并行性，从而提供额外的性能提升机会。我们提出了 TURBULENCE 架构，用于在 GPU 上实现极低成本的失序执行。TURBULENCE 由一个新颖的 ISA 和一个执行新颖 ISA 的新颖微体系结构组成，前者引入了通过指令间距离而不是寄存器编号来引用操作数的概念。这种基于距离的操作数具有不会造成错误依赖的特性。利用这一特性，我们无需引入重命名逻辑和加载存储队列等昂贵的硬件，就能在 GPU 上实现经济高效的无序执行。仿真结果表明，与现有的 GPU 相比，TURBULENCE 在不增加能耗的情况下将性能提高了 17.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.