Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2008-10-25 DOI:10.1145/1454115.1454125

Henry Wong, Anne Bracy, E. Schuchman, Tor M. Aamodt, Jamison D. Collins, P. Wang, G. Chinya, Ankur Khandelwal Groen, Hong Jiang, Hong Wang

{"title":"Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor","authors":"Henry Wong, Anne Bracy, E. Schuchman, Tor M. Aamodt, Jamison D. Collins, P. Wang, G. Chinya, Ankur Khandelwal Groen, Hong Jiang, Hong Wang","doi":"10.1145/1454115.1454125","DOIUrl":null,"url":null,"abstract":"Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8×.","PeriodicalId":186773,"journal":{"name":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1454115.1454125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 62

Abstract

Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs. Pangaea introduces (1) a resource repartitioning of the GPU, where the hardware budget dedicated for 3D-specific graphics processing is used to build more general-purpose GPU cores, and (2) a 3-instruction extension to the IA32 ISA that supports tighter architectural integration and fine-grain shared memory collaborative multithreading between the IA32 CPU cores and the non-IA32 GPU cores. We implement Pangaea and the current CPU-GPU designs in fully-functional synthesizable RTL based on the production quality RTL of an IA32 CPU and an Intel GMA X4500 GPU. On a 65 nm ASIC process technology, the legacy graphics-specific fixed-function hardware has the area of 9 GPU cores and total power consumption of 5 GPU cores. With the ISA extensions, the latency from the time an IA32 core spawns a GPU thread to the time the thread begins execution is reduced from thousands of cycles to fewer than 30 cycles. Pangaea is synthesized on a FPGA-based prototype and runs off-the-shelf IA32 OSes. A set of general-purpose non-graphics workloads demonstrate speedups of up to 8.8×.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pangaea:一个紧密耦合的IA32异构芯片多处理器

摩尔定律和对性能效率的追求导致了通用核心与专用加速器的片上集成。Pangaea是针对非渲染工作负载的异构CMP设计，它将IA32 CPU内核与非IA32 GPU类多核集成在一起，扩展了当前最先进的CPU-GPU集成，物理上“融合”了现有的CPU和GPU设计。Pangaea引入了(1)GPU的资源重新划分，其中专用于3d特定图形处理的硬件预算用于构建更通用的GPU内核，以及(2)对IA32 ISA的3指令扩展，支持更紧密的架构集成和IA32 CPU内核与非IA32 GPU内核之间的细粒度共享内存协作多线程。我们基于IA32 CPU和Intel GMA X4500 GPU的生产质量RTL，在全功能可合成RTL中实现Pangaea和当前的CPU-GPU设计。在65纳米ASIC工艺技术上，传统图形专用固定功能硬件的面积为9个GPU内核，总功耗为5个GPU内核。通过ISA扩展，从IA32内核生成GPU线程到线程开始执行的延迟从数千个周期减少到不到30个周期。Pangaea是在基于fpga的原型上合成的，运行现成的IA32操作系统。一组通用的非图形工作负载显示了高达8.8倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Meeting points: Using thread criticality to adapt multicore hardware to parallel regions COMIC: A coherent shared memory interface for cell BE Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor Multi-mode energy management for multi-tier server clusters MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory