单片大规模并行处理器上的时间关键计算

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2014-03-24 DOI:10.7873/DATE.2014.110

B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager

{"title":"单片大规模并行处理器上的时间关键计算","authors":"B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager","doi":"10.7873/DATE.2014.110","DOIUrl":null,"url":null,"abstract":"The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"174","resultStr":"{\"title\":\"Time-critical computing on a single-chip massively parallel processor\",\"authors\":\"B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager\",\"doi\":\"10.7873/DATE.2014.110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.\",\"PeriodicalId\":6550,\"journal\":{\"name\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"14 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"174\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7873/DATE.2014.110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7873/DATE.2014.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 174

摘要

通过在可能大量的可编程核上并行执行应用程序，可以满足低功耗下高性能计算的要求。然而，缺乏精确的计时属性可能会妨碍并行执行适用于时间要求严格的应用程序。我们通过适当地设计Kalray MPPA®-256单芯片多核处理器的体系结构、实现和编程模型来说明如何解决这个问题。MPPA®-256(多用途处理阵列)处理器在单个28nm CMOS芯片上集成了256个处理引擎(PE)内核和32个资源管理(RM)内核。这些VLIW内核分布在16个计算集群和4个I/O子系统中，每个子系统都有一个本地共享内存。片上通信和同步由显式寻址的双片上网络(NoC)支持，每个计算集群有一个节点，每个I/O子系统有4个节点。片外接口包括DDR, PCI和以太网，以及对NoC的直接访问，用于低延迟数据流处理。支持时间关键型应用程序的关键体系结构特性是计时组合核心、计算集群内的独立内存库以及由网络演算确定其保证服务的数据NoC。编程模型提供了有效支持分布式计算原语(如远程写、屏障同步、活动消息和抽样通信)的通信器。POSIX时间函数在计算集群和跨MPPA®-256处理器的中同步时钟内公开同步时钟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Time-critical computing on a single-chip massively parallel processor

The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量

期刊最新文献

Simple interpolants for linear arithmetic Modeling steep slope devices: From circuits to architectures Software-based Pauli tracking in fault-tolerant quantum circuits Using guided local search for adaptive resource reservation in large-scale embedded systems Emulation-based robustness assessment for automotive smart-power ICs