2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)最新文献

英文中文

Programmable SoC platform for deep packet inspection using enhanced Boyer-Moore algorithm 可编程SoC平台，深度包检测使用增强的Boyer-Moore算法

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016159

Adrián Dominguez, P. P. Carballo, A. Núñez

This paper describes the work done to design a SoC platform for real-time on-line pattern search in TCP packets for Deep Packet Inspection (DPI) applications. The platform is based on a Xilinx Zynq programmable SoC and includes an accelerator that implements a pattern search engine that extends the original Boyer-Moore algorithm with timing and logical rules, that produces a very complex set of rules. Also, the platform implements different modes of operation, including SIMD and MISD parallelism, which can be configured on-line. The platform is scalable depending of the analysis requirement up to 8 Gbps. High-Level synthesis and platform based design methodologies have been used to reduce the time to market of the completed system.

本文描述了为深度包检测(DPI)应用设计一个实时在线模式搜索TCP数据包的SoC平台所做的工作。该平台基于赛灵思Zynq可编程SoC，包括一个实现模式搜索引擎的加速器，该引擎扩展了原始的Boyer-Moore算法，具有时序和逻辑规则，可产生一组非常复杂的规则。此外，该平台实现了不同的操作模式，包括SIMD和MISD并行性，可以在线配置。该平台可根据分析需求进行扩展，最高可达8gbps。高级综合和基于平台的设计方法已被用于缩短完成系统的上市时间。

引用次数: 9

Computational self-awareness as design approach for visual sensor nodes 基于计算自我意识的视觉传感器节点设计方法

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016147

Zakarya Guettatfi, Philipp Hübner, M. Platzner, B. Rinner

Visual sensor networks (VSNs) represent distributed embedded systems with tight constraints on sensing, processing, memory, communications and power consumption. VSNs are expected to scale up in the number of nodes, be required to offer more complex functionality, a higher degree of flexibility and increased autonomy. The engineering of such VSNs capable of (self-)adapting on the application and platform levels poses a formidable challenge. In this paper, we introduce a novel design approach for visual sensor nodes which is founded on computational self-awareness. Computational self-awareness maintains knowledge about the system's state and environment with models and then uses this knowledge to reason about and adapt behaviours. We discuss the concept of computational self-awareness and present our novel design approach that is centred on a reference architecture for individual VSN nodes, but can be naturally extended to networks. We present the VSN node implementation with its platform architecture and resource adaptivity and report on preliminary implementation results of a Zynq-based VSN node prototype.

视觉传感器网络(VSNs)是一种在传感、处理、存储、通信和功耗等方面具有严格限制的分布式嵌入式系统。虚拟网络的节点数量有望扩大，需要提供更复杂的功能、更高程度的灵活性和更高的自主性。这种能够在应用程序和平台级别(自)适应的虚拟网络的工程提出了一个巨大的挑战。本文提出了一种基于计算自我意识的视觉传感器节点设计方法。计算自我意识通过模型维护关于系统状态和环境的知识，然后使用这些知识对行为进行推理和调整。我们讨论了计算自我意识的概念，并提出了新的设计方法，该方法以单个VSN节点的参考体系结构为中心，但可以自然地扩展到网络。我们介绍了VSN节点的实现及其平台架构和资源自适应能力，并报告了基于zynq的VSN节点原型的初步实现结果。

{"title":"Computational self-awareness as design approach for visual sensor nodes","authors":"Zakarya Guettatfi, Philipp Hübner, M. Platzner, B. Rinner","doi":"10.1109/ReCoSoC.2017.8016147","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016147","url":null,"abstract":"Visual sensor networks (VSNs) represent distributed embedded systems with tight constraints on sensing, processing, memory, communications and power consumption. VSNs are expected to scale up in the number of nodes, be required to offer more complex functionality, a higher degree of flexibility and increased autonomy. The engineering of such VSNs capable of (self-)adapting on the application and platform levels poses a formidable challenge. In this paper, we introduce a novel design approach for visual sensor nodes which is founded on computational self-awareness. Computational self-awareness maintains knowledge about the system's state and environment with models and then uses this knowledge to reason about and adapt behaviours. We discuss the concept of computational self-awareness and present our novel design approach that is centred on a reference architecture for individual VSN nodes, but can be naturally extended to networks. We present the VSN node implementation with its platform architecture and resource adaptivity and report on preliminary implementation results of a Zynq-based VSN node prototype.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130341864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

High-level test generation for processing elements in many-core systems 多核系统中处理元素的高级测试生成

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016156

S. Oyeniran, R. Ubar, Siavoosh Payandeh Azad, J. Raik

The advent of many-core system-on-chips (SoC) will involve new scalable hardware/software mechanisms that can efficiently utilize the abundance of interconnected processing elements found in these SoCs. These trends will have a great impact on the strategies for testing the systems and improving their reliability by exploiting system's re-configurability to achieve graceful degradation of system's performance. We propose a strategy of Software-Based Self-Test (SBST) to be used for testing of processing elements in many-core systems with the goal to increase fault coverage and structuring the test routines in a way which makes test-data delivery in many-core systems more efficient. A new high-level fault model is introduced, which covers a broad class of gate-level Stuck-at-Faults (SAF), conditional SAF, and bridging faults of any multiplicity in processor control paths. Two algorithms for high-level simulation-based test generation for the control path and a bit-wise pseudo-exhaustive test approach for data path are proposed. No implementation details are needed for test data generation. A novel method for proving the redundancy of high-level functional faults is presented, which allows for precise evaluation of fault coverage.

多核片上系统(SoC)的出现将涉及新的可扩展硬件/软件机制，这些机制可以有效地利用这些SoC中发现的大量互连处理元件。这些趋势将对系统的测试策略和利用系统的可重构性来实现系统性能的优雅降级来提高系统的可靠性产生重大影响。本文提出了一种基于软件的自测试(SBST)策略，用于多核系统中处理元素的测试，目的是增加故障覆盖率，并构建测试例程，使测试数据在多核系统中更有效地传递。介绍了一种新的高级故障模型，它涵盖了广泛的门级故障卡滞(SAF)、条件故障卡滞(SAF)和处理器控制路径中任意多重的桥接故障。提出了两种基于高级仿真的控制路径测试生成算法和数据路径的逐位伪穷举测试方法。测试数据生成不需要实现细节。提出了一种证明高级功能故障冗余度的新方法，可以精确地评估故障覆盖。

{"title":"High-level test generation for processing elements in many-core systems","authors":"S. Oyeniran, R. Ubar, Siavoosh Payandeh Azad, J. Raik","doi":"10.1109/ReCoSoC.2017.8016156","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016156","url":null,"abstract":"The advent of many-core system-on-chips (SoC) will involve new scalable hardware/software mechanisms that can efficiently utilize the abundance of interconnected processing elements found in these SoCs. These trends will have a great impact on the strategies for testing the systems and improving their reliability by exploiting system's re-configurability to achieve graceful degradation of system's performance. We propose a strategy of Software-Based Self-Test (SBST) to be used for testing of processing elements in many-core systems with the goal to increase fault coverage and structuring the test routines in a way which makes test-data delivery in many-core systems more efficient. A new high-level fault model is introduced, which covers a broad class of gate-level Stuck-at-Faults (SAF), conditional SAF, and bridging faults of any multiplicity in processor control paths. Two algorithms for high-level simulation-based test generation for the control path and a bit-wise pseudo-exhaustive test approach for data path are proposed. No implementation details are needed for test data generation. A novel method for proving the redundancy of high-level functional faults is presented, which allows for precise evaluation of fault coverage.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116069664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Adaptive and reconfigurable bubble routing technique for 2D Torus interconnection networks 二维环面互连网络的自适应可重构气泡路由技术

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016155

Poona Bahrebar, D. Stroobandt

Networks with torus interconnection topology are widely used due to the symmetry in traffic distribution. In order to ensure deadlock-freedom and provide adaptive routing in torus, at least two Virtual Channels (VCs) per physical channel are required to break the cyclic channel dependencies. However, VCs increase the arbitration latency and consume large power/area overheads which is undesirable, particularly for on-chip networks with limited power/area budgets. In this paper, we propose a novel technique for routing in wormhole-switched 2D torus networks. The proposed method relies on the Abacus Turn Model (AbTM) and Worm-Bubble Flow Control (WBFC) to support adaptive and deadlock-free routing without using VCs. Furthermore, the network blocking is reduced by providing on-demand routing adaptiveness through reconfiguration. The experimental results demonstrate the efficiency of the proposed scheme in terms of performance and hardware overhead.

环面互连网络由于其流量分布的对称性而得到了广泛的应用。为了确保无死锁并提供环面自适应路由，每个物理通道至少需要两个虚拟通道(VCs)来打破循环通道依赖。然而，vc增加了仲裁延迟并消耗了大量的功率/面积开销，这是不希望的，特别是对于功率/面积预算有限的片上网络。在本文中，我们提出了一种新的虫洞交换二维环面网络路由技术。该方法基于Abacus转弯模型(AbTM)和虫泡流控制(WBFC)来支持自适应和无死锁路由，而无需使用VCs。此外，通过重新配置提供按需路由自适应，减少了网络阻塞。实验结果证明了该方案在性能和硬件开销方面的有效性。

引用次数: 0

Towards trace-driven cache attacks on Systems-on-Chips — exploiting bus communication 基于片上系统总线通信的跟踪驱动缓存攻击研究

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016150

Martha Johanna Sepúlveda, Mathieu Gross, A. Zankl, G. Sigl

The growing complexity of Systems-on-Chips (SoCs) increases the risk of software attacks during runtime. A critical threat to system security are so-called side-channel attacks based on the processor cache and its usage during the execution of cryptographic algorithms. Recent publications have analyzed cache attacks on mobile devices and network-on-chip platforms. In this work, we investigate cache attacks on bus-like tile-based Multi-Processor Systems-on-Chips (MPSoCs). This work presents two contributions. First, we demonstrate a trace-driven cache attack on AES-128 based on the exploitation of bus communication. Second, we integrate two countermeasures (Shuffling and Mini-table) and evaluate their impact on the trace-based cache attack and on the performance of the system. The results illustrate that trace-driven attacks based on bus communication are a non-negligible threat in SoC environments. The results also show that the protection techniques are feasible to implement and that they are able to mitigate the attacks.

片上系统(soc)的日益复杂增加了软件在运行时受到攻击的风险。对系统安全的一个关键威胁是所谓的基于处理器缓存及其在加密算法执行期间使用的侧信道攻击。最近的出版物分析了移动设备和片上网络平台上的缓存攻击。在这项工作中，我们研究了基于总线的多处理器片上系统(mpsoc)的缓存攻击。这项工作有两个贡献。首先，我们展示了一种基于利用总线通信的AES-128跟踪驱动的缓存攻击。其次，我们整合了两种对策(shuffle和Mini-table)，并评估了它们对基于跟踪的缓存攻击和系统性能的影响。结果表明，基于总线通信的跟踪驱动攻击在SoC环境中是一个不可忽视的威胁。结果表明，该防护技术是可行的，能够有效地减轻攻击。

引用次数: 1

On-demand instantiation of co-processors on dynamically reconfigurable FPGAs 动态可重构fpga上协处理器的按需实例化

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016153

Marcel Essig, K. F. Ackermann

State of the art FPGAs comprise various architectural features providing the performance and flexibility required to comply with growing real-time demands of today's industrial applications. Nevertheless, the requirements on engineering expertise in order to exploit these platform features significantly increased during the past few years, consequently raising product costs and the time-to-market as well. Especially the feature of dynamic partial reconfiguration, enabling timedivision multiplexing of resources within the reconfigurable fabric, is barely adopted by industry yet. This paper introduces a lightweight co-processing framework, taking advantage of an embedded processor closely coupled with the programmable logic inside the FPGA. The basic idea of this concept is to implement the sequential control flow of applications in software, while reconfigurable hardware accelerators may be utilized on-demand, in order to increase the performance on computation-intensive tasks. A hardware abstraction layer hides complex architectural processes and provides software engineers with a set of routines, enabling run-time requests and the interfacing of co-processors from within the code. Implementation details and sequences of operations are given and discussed.

最先进的fpga包括各种架构功能，提供满足当今工业应用日益增长的实时需求所需的性能和灵活性。然而，在过去的几年里，为了开发这些平台的功能，对工程专业知识的要求显著增加，从而提高了产品成本和上市时间。特别是动态部分重构的特性，实现了可重构结构内资源的时视复用，目前还很少被工业采用。本文介绍了一种轻量级的协同处理框架，利用嵌入式处理器与FPGA内部的可编程逻辑紧密耦合的优势。该概念的基本思想是在软件中实现应用程序的顺序控制流，而可重构的硬件加速器可以按需使用，以提高计算密集型任务的性能。硬件抽象层隐藏了复杂的体系结构过程，并为软件工程师提供了一组例程，支持运行时请求和代码内的协处理器接口。给出并讨论了实现细节和操作顺序。

{"title":"On-demand instantiation of co-processors on dynamically reconfigurable FPGAs","authors":"Marcel Essig, K. F. Ackermann","doi":"10.1109/ReCoSoC.2017.8016153","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016153","url":null,"abstract":"State of the art FPGAs comprise various architectural features providing the performance and flexibility required to comply with growing real-time demands of today's industrial applications. Nevertheless, the requirements on engineering expertise in order to exploit these platform features significantly increased during the past few years, consequently raising product costs and the time-to-market as well. Especially the feature of dynamic partial reconfiguration, enabling timedivision multiplexing of resources within the reconfigurable fabric, is barely adopted by industry yet. This paper introduces a lightweight co-processing framework, taking advantage of an embedded processor closely coupled with the programmable logic inside the FPGA. The basic idea of this concept is to implement the sequential control flow of applications in software, while reconfigurable hardware accelerators may be utilized on-demand, in order to increase the performance on computation-intensive tasks. A hardware abstraction layer hides complex architectural processes and provides software engineers with a set of routines, enabling run-time requests and the interfacing of co-processors from within the code. Implementation details and sequences of operations are given and discussed.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133192953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ElasticSimMATE: A fast and accurate gem5 trace-driven simulator for multicore systems ElasticSimMATE:用于多核系统的快速准确的gem5跟踪驱动模拟器

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016146

A. Nocua, Florent Bruguier, G. Sassatelli, A. Gamatie

Multicore system analysis requires efficient solutions for architectural parameter and scalability exploration. Long simulation time is the main drawback of current simulation approaches. In order to reduce the simulation time while keeping the accuracy levels, trace-driven simulation approaches have been developed. However, existing approaches do not allow multicore exploration or do not capture the behavior of multithreaded programs. Based on the gem5 simulator, we developed a novel synchronization mechanism for multicore analysis based on the trace collection of synchronization events, instruction and dependencies. It allows efficient architectural parameter and scalability exploration with acceptable simulation speed and accuracy.

多核系统分析需要有效的体系结构参数和可扩展性探索解决方案。仿真时间长是当前仿真方法的主要缺点。为了在保证精度的前提下减少仿真时间，跟踪驱动仿真方法得到了发展。然而，现有的方法不允许多核探索或不捕获多线程程序的行为。基于gem5模拟器，我们开发了一种基于同步事件、指令和依赖关系跟踪收集的多核分析同步机制。它允许以可接受的仿真速度和精度进行有效的体系结构参数和可扩展性探索。

引用次数: 15

Characterization and optimization of behavioral hardware accelerators in heterogeneous MPSoCs 异构mpsoc中行为硬件加速器的表征与优化

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016158

Yidi Liu, M. Villaverde, F. Moreno, Benjamin Carrión Schäfer

This work presents a method to characterize and optimize hardware accelerators (HWaccs) given as Behavioral IPs (BIPs) mapped as loosely coupled HWaccs in heterogenous MPSoCs. The proposed HWacc exploration flow is composed of two main stages. The first stage characterizes each BIPs individually by performing a High-Level Synthesis (HLS) Design Space Exploration (DSE) on each of the BIPs to obtain a trade-off curve of Pareto-optimal designs. It then continues by exploring the system-level design space using these Pareto-optimal designs and finding configurations with unique area vs. performance trade-offs. Our proposed system-level explorer makes use of cycle-accurate simulation models to explore the search space fast and accurately. Experimental results show that our proposed method works well for MPSoCs of different sizes ranging from systems with 1 to 4 masters and with 3 to 7 HWaccs.

本研究提出了一种表征和优化硬件加速器(HWaccs)的方法，该硬件加速器(HWaccs)作为行为ip (bip)在异构mpsoc中映射为松耦合HWaccs。提出的HWacc勘探流程由两个主要阶段组成。第一阶段通过在每个bip上执行高级综合(HLS)设计空间探索(DSE)来单独表征每个bip，以获得帕累托最优设计的权衡曲线。然后继续使用这些帕累托最优设计探索系统级设计空间，并找到具有独特面积与性能权衡的配置。我们提出的系统级资源管理器利用周期精确的仿真模型来快速准确地探索搜索空间。实验结果表明，该方法适用于1到4个主节点和3到7个hwacc的不同尺寸的mpsoc。

引用次数: 1

Fault-resilient NoC router with transparent resource allocation 具有透明资源分配的容错NoC路由器

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016161

Tsotne Putkaradze, Siavoosh Payandeh Azad, Behrad Niazmand, J. Raik, G. Jervan

The current trend of aggressive technology scaling results in a decrease in system's reliability. This motivates investigation of fault-resilient architectures which provide graceful degradation of system's functionality. In this paper, three novel fault-resilient Network-on-Chip (NoC) router architectures are proposed. These architectures, exploit the regularity of the router and reallocate available existing and spare units to maintain functionality of certain turns. The resource reallocation is performed transparently from system's resource manager and is based on predefined priorities. A new metric for architecture reliability comparison based on reliability block diagrams is introduced. In contrast to Silicone Protection Factor (SPF) metric, the proposed metric also takes into account the areas of different units. Area overhead and reliability of proposed architectures are compared with Triple Modular Redundancy (TMR) and Unit-Duplication mechanisms. All proposed architectures showed remarkable reliability improvement compared to original, TMR and Unit Duplication architectures; while at the same time, their area overhead is less than or equal to unit-duplication mechanisms.

当前，技术的迅猛扩张导致了系统可靠性的下降。这激发了对提供系统功能优雅降级的故障弹性架构的研究。本文提出了三种新的故障弹性片上网络(NoC)路由器架构。这些架构利用路由器的规律性，重新分配可用的现有和备用单元，以保持某些转弯的功能。资源重新分配由系统的资源管理器透明地执行，并基于预定义的优先级。提出了一种新的基于可靠性方框图的结构可靠性比较度量。与硅酮保护系数(SPF)度量相比，拟议的度量还考虑了不同单位的面积。将所提架构的面积开销和可靠性与三模冗余(TMR)和单元重复机制进行了比较。与原始体系结构、TMR体系结构和Unit Duplication体系结构相比，所有提出的体系结构的可靠性都有显著提高;同时，它们的面积开销小于或等于单元复制机制。

{"title":"Fault-resilient NoC router with transparent resource allocation","authors":"Tsotne Putkaradze, Siavoosh Payandeh Azad, Behrad Niazmand, J. Raik, G. Jervan","doi":"10.1109/ReCoSoC.2017.8016161","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016161","url":null,"abstract":"The current trend of aggressive technology scaling results in a decrease in system's reliability. This motivates investigation of fault-resilient architectures which provide graceful degradation of system's functionality. In this paper, three novel fault-resilient Network-on-Chip (NoC) router architectures are proposed. These architectures, exploit the regularity of the router and reallocate available existing and spare units to maintain functionality of certain turns. The resource reallocation is performed transparently from system's resource manager and is based on predefined priorities. A new metric for architecture reliability comparison based on reliability block diagrams is introduced. In contrast to Silicone Protection Factor (SPF) metric, the proposed metric also takes into account the areas of different units. Area overhead and reliability of proposed architectures are compared with Triple Modular Redundancy (TMR) and Unit-Duplication mechanisms. All proposed architectures showed remarkable reliability improvement compared to original, TMR and Unit Duplication architectures; while at the same time, their area overhead is less than or equal to unit-duplication mechanisms.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124157693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs 基于多fpga的带宽压缩流计算设计与可扩展性分析

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Pub Date : 2017-07-01 DOI: 10.1109/ReCoSoC.2017.8016148

Antoniette Mondigo, Tomohiro Ueno, Daichi Tanaka, K. Sano, S. Yamamoto

Stream computing in Field Programmable Gate Arrays (FPGAs) is seen as a promising solution in delivering the necessary performance and energy efficiency requirements of compute-intensive applications like numerical simulations. The inherent structure and customizability of FPGAs naturally make them the better alternative in achieving a highly-scalable computing design solution. This paper presents a scalable custom computing approach through temporal parallelism by increasing the depth of a computing pipeline in a 1D ring of cascaded FPGAs with high-speed, low-latency communication links. Spatial parallelism is also explored by replicating the computing core inside the FPGAs to further increase throughput. Due to communication bandwidth limitations, a hardware-based lossless bandwidth compression scheme was utilized in order to alleviate this bottleneck and transfer more data streams. A performance model is presented for the scalability analysis and performance estimation of this approach. For evaluation and verification, an actual numerical simulation was implemented on an Intel Arria 10 FPGA with spatially paralleled computing cores. Initial results show that the measured performance ratings are close to the predicted values using the performance model. Similarly, it was also demonstrated that the 1D ring topology of multiple FPGAs with bandwidth-compressed links can scale the performance when a sufficiently large data set is computed, even with a deeper pipeline and insufficient inter-FPGA bandwidth.

现场可编程门阵列(fpga)中的流计算被视为一种很有前途的解决方案，可以为数值模拟等计算密集型应用提供必要的性能和能效要求。fpga固有的结构和可定制性自然使它们成为实现高度可扩展计算设计解决方案的更好选择。本文提出了一种可扩展的自定义计算方法，通过时间并行性，通过增加具有高速，低延迟通信链路的级联fpga的一维环中的计算管道的深度。通过在fpga内部复制计算核心来进一步提高吞吐量，探索了空间并行性。由于通信带宽的限制，为了缓解这一瓶颈，传输更多的数据流，采用了基于硬件的无损带宽压缩方案。针对该方法的可扩展性分析和性能评估，提出了一个性能模型。为了评估和验证，在具有空间并行计算核的Intel Arria 10 FPGA上进行了实际数值模拟。初步结果表明，测量的性能等级接近使用性能模型的预测值。同样，还证明了具有带宽压缩链路的多个fpga的1D环拓扑可以在计算足够大的数据集时扩展性能，即使有更深的管道和fpga间带宽不足。

{"title":"Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs","authors":"Antoniette Mondigo, Tomohiro Ueno, Daichi Tanaka, K. Sano, S. Yamamoto","doi":"10.1109/ReCoSoC.2017.8016148","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2017.8016148","url":null,"abstract":"Stream computing in Field Programmable Gate Arrays (FPGAs) is seen as a promising solution in delivering the necessary performance and energy efficiency requirements of compute-intensive applications like numerical simulations. The inherent structure and customizability of FPGAs naturally make them the better alternative in achieving a highly-scalable computing design solution. This paper presents a scalable custom computing approach through temporal parallelism by increasing the depth of a computing pipeline in a 1D ring of cascaded FPGAs with high-speed, low-latency communication links. Spatial parallelism is also explored by replicating the computing core inside the FPGAs to further increase throughput. Due to communication bandwidth limitations, a hardware-based lossless bandwidth compression scheme was utilized in order to alleviate this bottleneck and transfer more data streams. A performance model is presented for the scalability analysis and performance estimation of this approach. For evaluation and verification, an actual numerical simulation was implemented on an Intel Arria 10 FPGA with spatially paralleled computing cores. Initial results show that the measured performance ratings are close to the predicted values using the performance model. Similarly, it was also demonstrated that the 1D ring topology of multiple FPGAs with bandwidth-compressed links can scale the performance when a sufficiently large data set is computed, even with a deeper pipeline and insufficient inter-FPGA bandwidth.","PeriodicalId":393701,"journal":{"name":"2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129566660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀