首页 > 最新文献

2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)最新文献

英文 中文
Comparative analysis of flexible cryptographic implementations 灵活密码实现的比较分析
M. Rashid, Malik Imran, A. Jafri
Flexible hardware implementations of cryptographic algorithms in the real time applications have been frequently proposed. This paper classifies the state-of-the-art research practices through a Systematic Literature Review (SLR) process. The selected researches have been classified into three design categories: crypto processor, crypto coprocessor and multicore crypto processor. Subsequently, comparative analysis in terms of flexibility, throughput and area is presented. It facilitates the researchers and designers of the domain to select an appropriate design approach for a particular algorithm and/or application.
在实时应用中,经常提出灵活的加密算法硬件实现。本文通过系统文献回顾(SLR)过程对最新的研究实践进行分类。所选择的研究分为三个设计类别:加密处理器、加密协处理器和多核加密处理器。随后,从灵活性、吞吐量和面积三个方面进行了比较分析。它有助于该领域的研究人员和设计者为特定的算法和/或应用选择合适的设计方法。
{"title":"Comparative analysis of flexible cryptographic implementations","authors":"M. Rashid, Malik Imran, A. Jafri","doi":"10.1109/ReCoSoC.2016.7533901","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533901","url":null,"abstract":"Flexible hardware implementations of cryptographic algorithms in the real time applications have been frequently proposed. This paper classifies the state-of-the-art research practices through a Systematic Literature Review (SLR) process. The selected researches have been classified into three design categories: crypto processor, crypto coprocessor and multicore crypto processor. Subsequently, comparative analysis in terms of flexibility, throughput and area is presented. It facilitates the researchers and designers of the domain to select an appropriate design approach for a particular algorithm and/or application.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121912903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
EXTRA: Towards the exploitation of eXascale technology for reconfigurable architectures EXTRA:将eXascale技术用于可重构架构
D. Stroobandt, A. Varbanescu, C. Ciobanu, Muhammed Al Kadi, A. Brokalakis, George Charitopoulos, T. Todman, Xinyu Niu, D. Pnevmatikatos, Amit Kulkarni, Elias Vansteenkiste, W. Luk, M. Santambrogio, D. Sciuto, M. Huebner, Tobias Becker, G. Gaydadjiev, A. Nikitakis, A. Thom
To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field.
为了满足未来百亿亿级应用的严格性能要求,高性能计算(HPC)系统需要超高效的异构计算节点。为了降低功耗和提高性能,这样的计算节点将需要高度专业化的硬件加速器。理想情况下,动态重新配置将是一个固有的特性,这样特定的HPC应用程序特性就可以得到最佳的加速,即使它们会随着时间的推移而定期变化。在EXTRA项目中,我们创建了一个新的灵活的探索平台,用于开发可重构架构、设计工具和HPC应用程序,将运行时可重构内置为核心基本功能,而不是附加功能。EXTRA涵盖了从架构到应用程序的整个堆栈,重点关注运行时可重构的百万亿级HPC系统的基本构建块:具有非常低重构开销的新芯片架构,真正将重构作为核心设计概念的新工具,以及从拟议的运行时重构技术中最大程度受益的应用程序。最终,这个开放平台将提高欧洲在该领域的竞争优势和领导地位。
{"title":"EXTRA: Towards the exploitation of eXascale technology for reconfigurable architectures","authors":"D. Stroobandt, A. Varbanescu, C. Ciobanu, Muhammed Al Kadi, A. Brokalakis, George Charitopoulos, T. Todman, Xinyu Niu, D. Pnevmatikatos, Amit Kulkarni, Elias Vansteenkiste, W. Luk, M. Santambrogio, D. Sciuto, M. Huebner, Tobias Becker, G. Gaydadjiev, A. Nikitakis, A. Thom","doi":"10.1109/ReCoSoC.2016.7533896","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533896","url":null,"abstract":"To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ERRCA: A buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition ERRCA:具有永久错误识别的缓冲效率可重构光片上网络
Wolfgang Büter, Dominic Oehlert, A. Ortiz
Optical on-chip communication technology provides an unprecedented bandwidth. It allows to connect the hundreds or even thousands of processing elements expected in many core systems using optical Network-on-Chip. However, the required buffers to interface the electrical and optical layers are very large, since optical data-flow cannot be stored. Moreover, on-chip optical technologies have high defect rates which limits its usability severely. In order to address these challenges, this work presents a buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition. The buffer-efficiency is achieved by a global credit-based arbitration with optical tokens. Further on, the architecture autonomously detects permanent errors in the optical components and configures the communication paths to avoid them. The work provides a thorough analysis at the gate-level of the area overhead incurred by the electrical sub-modules of the proposed system. It shows the practicability of the approach, experimental validated on a FPGA prototype. Compared with previously reported optical networks, it achieves an area reduction of up to 80% with almost identical performance.
光片上通信技术提供了前所未有的带宽。它允许使用光片上网络连接许多核心系统中预期的数百甚至数千个处理元素。然而,由于光数据流无法存储,因此连接电层和光层所需的缓冲区非常大。此外,片上光学技术的高缺品率严重限制了其可用性。为了解决这些挑战,本工作提出了一种具有永久错误识别功能的缓冲效率可重构光片上网络。缓冲效率是通过使用光学令牌的全局基于信用的仲裁来实现的。此外,该体系结构自动检测光学组件中的永久错误,并配置通信路径以避免这些错误。这项工作在门级对拟议系统的电气子模块产生的区域开销进行了彻底的分析。实验结果表明了该方法的实用性,并在FPGA样机上进行了验证。与先前报道的光网络相比,它在几乎相同的性能下实现了高达80%的面积减少。
{"title":"ERRCA: A buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition","authors":"Wolfgang Büter, Dominic Oehlert, A. Ortiz","doi":"10.1109/ReCoSoC.2016.7533909","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533909","url":null,"abstract":"Optical on-chip communication technology provides an unprecedented bandwidth. It allows to connect the hundreds or even thousands of processing elements expected in many core systems using optical Network-on-Chip. However, the required buffers to interface the electrical and optical layers are very large, since optical data-flow cannot be stored. Moreover, on-chip optical technologies have high defect rates which limits its usability severely. In order to address these challenges, this work presents a buffer-efficient reconfigurable optical Network-on-Chip with permanent-error recognition. The buffer-efficiency is achieved by a global credit-based arbitration with optical tokens. Further on, the architecture autonomously detects permanent errors in the optical components and configures the communication paths to avoid them. The work provides a thorough analysis at the gate-level of the area overhead incurred by the electrical sub-modules of the proposed system. It shows the practicability of the approach, experimental validated on a FPGA prototype. Compared with previously reported optical networks, it achieves an area reduction of up to 80% with almost identical performance.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128611782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient bandwidth regulation at memory controller for mixed criticality applications 用于混合临界应用的内存控制器的有效带宽调节
G. Tsamis, S. Kavvadias, A. Papagrigoriou, M. Grammatikakis, Kyprianos Papademetriou
We design a bandwidth regulation module, by adapting and extending the algorithm of MemGuard Linux kernel module for hardware implementation. Our extensions differentiate among NoC sources with rate-constrained and best-effort traffic provisions, support a violation free-guaranteed operating mode for rate-constrained flows, and support dynamic adaptivity through EWMA prediction. Our strategies enhance support for mixed criticality applications on MPSoCs. C++-based statistical simulation shows improvements over hardware adaptation of the original MemGuard algorithm without our extensions. Using SystemC, we further evaluate MemGuard at the memory controller of a NoC-based SoC model using an MPEG4 traffic model and compare its hardware cost using synthesis from Xilinx Vivado HLS and Vivado, with ARM AMBA AXI4 and a 4×4 STNoC instance.
我们设计了一个带宽调节模块,通过对MemGuard Linux内核模块算法的改编和扩展进行硬件实现。我们的扩展区分了具有速率约束和尽最大努力流量规定的NoC源,支持速率约束流的违规免费保证操作模式,并通过EWMA预测支持动态自适应。我们的策略增强了对mpsoc混合关键应用的支持。基于c++的统计仿真显示了在没有我们的扩展的情况下,原始MemGuard算法在硬件上的改进。使用SystemC,我们使用MPEG4流量模型进一步评估了MemGuard在基于noc的SoC模型的内存控制器上的性能,并通过Xilinx Vivado HLS和Vivado的综合,与ARM AMBA AXI4和4×4 STNoC实例比较了其硬件成本。
{"title":"Efficient bandwidth regulation at memory controller for mixed criticality applications","authors":"G. Tsamis, S. Kavvadias, A. Papagrigoriou, M. Grammatikakis, Kyprianos Papademetriou","doi":"10.1109/ReCoSoC.2016.7533902","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533902","url":null,"abstract":"We design a bandwidth regulation module, by adapting and extending the algorithm of MemGuard Linux kernel module for hardware implementation. Our extensions differentiate among NoC sources with rate-constrained and best-effort traffic provisions, support a violation free-guaranteed operating mode for rate-constrained flows, and support dynamic adaptivity through EWMA prediction. Our strategies enhance support for mixed criticality applications on MPSoCs. C++-based statistical simulation shows improvements over hardware adaptation of the original MemGuard algorithm without our extensions. Using SystemC, we further evaluate MemGuard at the memory controller of a NoC-based SoC model using an MPEG4 traffic model and compare its hardware cost using synthesis from Xilinx Vivado HLS and Vivado, with ARM AMBA AXI4 and a 4×4 STNoC instance.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121054125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An ultra-low energy PUF matching security platform using programmable delay lines 采用可编程延迟线的超低能量PUF匹配安全平台
T. Xu, Hongxiang Gu, M. Potkonjak
We have proposed a new security platform: physical unclonable function (PUF) matching using programmable delay lines (PDL). Our platform inherits good security properties of standard PUFs, such as low energy, low delay, and unclonability. However, standard PUF-based security protocols induce high computational resources of at least one involved party. To resolve this issue, we take advantage of PDL technology to match standard PUFs in such a way that two PUFs have the same challenge response mapping function. The matched pair of PUFs enables a majority of protocols to be executed in an ultra low energy, low latency manner for all the involved parties.
我们提出了一种新的安全平台:利用可编程延迟线(PDL)进行物理不可克隆功能(PUF)匹配。我们的平台继承了标准puf良好的安全特性,如低功耗、低延迟和不可克隆性。然而,标准的基于puf的安全协议会导致至少一个相关方的高计算资源。为了解决这个问题,我们利用PDL技术匹配标准puf,使两个puf具有相同的挑战响应映射功能。匹配的puf对使大多数协议能够以超低能量、低延迟的方式为所有相关方执行。
{"title":"An ultra-low energy PUF matching security platform using programmable delay lines","authors":"T. Xu, Hongxiang Gu, M. Potkonjak","doi":"10.1109/ReCoSoC.2016.7533899","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533899","url":null,"abstract":"We have proposed a new security platform: physical unclonable function (PUF) matching using programmable delay lines (PDL). Our platform inherits good security properties of standard PUFs, such as low energy, low delay, and unclonability. However, standard PUF-based security protocols induce high computational resources of at least one involved party. To resolve this issue, we take advantage of PDL technology to match standard PUFs in such a way that two PUFs have the same challenge response mapping function. The matched pair of PUFs enables a majority of protocols to be executed in an ultra low energy, low latency manner for all the involved parties.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116201409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Verifying worst-case completion times for reconfigurable hardware modules using proof-carrying hardware 使用携带证明的硬件验证可重构硬件模块的最坏情况完成时间
T. Wiersema, M. Platzner
Runtime reconfiguration can be used to replace hardware modules in the field and even to continuously improve them during operation. Runtime reconfiguration poses new challenges for validation, since the required properties of newly arriving modules may be difficult to check fast enough to sustain the intended system dynamics. In this paper we present a method for just-in-time verification of the worst-case completion time of a reconfigurable hardware module. We assume so-called run-to-completion modules that exhibit start and done signals indicating the start and end of execution, respectively. We present a formal verification approach that exploits the concept of proof-carrying hardware. The approach tasks the creator of a hardware module with constructing a proof of the worst-case completion time, which can then easily be checked by the user of the module, just prior to reconfiguration. After explaining the verification approach and a corresponding tool flow, we present results from two case studies, a short term synthesis filter and a multihead weigher. The results clearly show that cost of verifying the completion time of the module is paid by the creator instead of the user of the module.
运行时重构可用于现场更换硬件模块,甚至在运行过程中不断改进硬件模块。运行时重新配置为验证带来了新的挑战,因为新到达的模块所需的属性可能难以足够快地检查以维持预期的系统动态。本文提出了一种实时验证可重构硬件模块最坏情况完成时间的方法。我们假设所谓的运行到完成模块分别显示start和done信号,表示执行的开始和结束。我们提出了一种利用携带证明硬件概念的形式化验证方法。该方法要求硬件模块的创建者构造最坏情况完成时间的证明,然后模块的用户可以在重新配置之前轻松地检查该证明。在解释了验证方法和相应的工具流程之后,我们介绍了两个案例研究的结果,一个短期综合过滤器和一个多头称重器。结果清楚地表明,验证模块完成时间的成本是由创建者而不是模块的用户支付的。
{"title":"Verifying worst-case completion times for reconfigurable hardware modules using proof-carrying hardware","authors":"T. Wiersema, M. Platzner","doi":"10.1109/ReCoSoC.2016.7533910","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533910","url":null,"abstract":"Runtime reconfiguration can be used to replace hardware modules in the field and even to continuously improve them during operation. Runtime reconfiguration poses new challenges for validation, since the required properties of newly arriving modules may be difficult to check fast enough to sustain the intended system dynamics. In this paper we present a method for just-in-time verification of the worst-case completion time of a reconfigurable hardware module. We assume so-called run-to-completion modules that exhibit start and done signals indicating the start and end of execution, respectively. We present a formal verification approach that exploits the concept of proof-carrying hardware. The approach tasks the creator of a hardware module with constructing a proof of the worst-case completion time, which can then easily be checked by the user of the module, just prior to reconfiguration. After explaining the verification approach and a corresponding tool flow, we present results from two case studies, a short term synthesis filter and a multihead weigher. The results clearly show that cost of verifying the completion time of the module is paid by the creator instead of the user of the module.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134435524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speed and accuracy dilemma in NoC simulation: What about memory impact? NoC仿真中的速度和准确性困境:对内存的影响如何?
Manuel Selva, A. Gamatie, D. Novo, G. Sassatelli
Network on Chip (NoC) communication infrastructures are increasingly being used in modern manycore architectures. Many industrial and research NoC simulators have been proposed in the last years in order to facilitate the design of such communication infrastructures. As any simulator, all of them have to trade off speed and accuracy. Simulation time directly depends on the simulation accuracy. It also directly depends on the complexity of the system to be simulated, e.g., the number of cores and their unit complexity. In this work, we show that the memory footprint of NoC simulators can be a serious factor limiting the simulation of manycore architectures with a large number of cores. We first quantitatively compare the memory footprint of a transactional level modeling NoC simulator and its cycle-accurate counterpart to show that memory footprint is a concern. Then, we show that memory footprint is also largely impacted by the choice of the programming abstraction by comparing two cycle-accurate simulators written using different application programming interfaces, i.e., plain C++ and SystemC.
片上网络通信基础设施在现代多核体系结构中得到越来越多的应用。在过去的几年中,为了方便这种通信基础设施的设计,已经提出了许多工业和研究NoC模拟器。与任何模拟器一样,它们都必须权衡速度和准确性。仿真时间直接取决于仿真精度。它还直接取决于要模拟的系统的复杂性,例如,内核的数量和它们的单元复杂性。在这项工作中,我们证明了NoC模拟器的内存占用可能是限制具有大量内核的多核架构模拟的一个严重因素。我们首先定量地比较事务级建模NoC模拟器和它的周期精确对应的内存占用,以表明内存占用是一个问题。然后,我们通过比较使用不同的应用程序编程接口(即普通c++和SystemC)编写的两个周期精确模拟器,表明内存占用也在很大程度上受到编程抽象选择的影响。
{"title":"Speed and accuracy dilemma in NoC simulation: What about memory impact?","authors":"Manuel Selva, A. Gamatie, D. Novo, G. Sassatelli","doi":"10.1109/ReCoSoC.2016.7533893","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533893","url":null,"abstract":"Network on Chip (NoC) communication infrastructures are increasingly being used in modern manycore architectures. Many industrial and research NoC simulators have been proposed in the last years in order to facilitate the design of such communication infrastructures. As any simulator, all of them have to trade off speed and accuracy. Simulation time directly depends on the simulation accuracy. It also directly depends on the complexity of the system to be simulated, e.g., the number of cores and their unit complexity. In this work, we show that the memory footprint of NoC simulators can be a serious factor limiting the simulation of manycore architectures with a large number of cores. We first quantitatively compare the memory footprint of a transactional level modeling NoC simulator and its cycle-accurate counterpart to show that memory footprint is a concern. Then, we show that memory footprint is also largely impacted by the choice of the programming abstraction by comparing two cycle-accurate simulators written using different application programming interfaces, i.e., plain C++ and SystemC.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Address interleaving for low-cost NoCs 低成本noc的地址交错
M. Grammatikakis, Kyprianos Papademetriou, P. Petrakis, M. Coppola, Michael Soulie
New generations of NoC-based platforms incorporate address interleaving, which enables balancing transactions between the memory nodes. The memory space is distributed in different nodes of the NoC, and accessed alternately by each on-chip initiator. A memory node is accessed depending on the transaction request address through a memory map. Interleaving can allow for efficient use of NoC bandwidth and congestion reduction, and we study whether its gains scale over system size. In this work we concentrate on an instance of a customizable point-to-point interconnect from STMicroelectronics called STNoC. We first evaluate a setup with 4 CPU initiators and 4 memories, and show that interleaving relieves the NoC from congestion and permits higher packet injection rates. We also show that this depends on the number of packets sent per transaction by an initiator prior to changing destination memory node; this is called interleaving step. We then enriched the setup with several DMA engines, which is in accordance with industry roadmap. We experimented with MPSoCs having up to 32-nodes and for various link-widths of the STNoC. When link-width was 32 Bytes, the aggregate throughput gain from address interleaving was 20.8%, but when we set it 8 Bytes the throughput gain reached 69.64%. This implies silicon savings in SoCs, as it is not always necessary to configure NoCs with wide link-widths.
新一代的基于noc的平台整合了地址交错,这使得在内存节点之间平衡事务成为可能。内存空间分布在NoC的不同节点上,由每个片上启动器交替访问。通过内存映射根据事务请求地址访问内存节点。交错可以有效地利用NoC带宽和减少拥塞,我们研究了它的收益是否随系统大小而变化。在这项工作中,我们专注于意法半导体的一个可定制的点对点互连实例,称为STNoC。我们首先评估了一个有4个CPU启动器和4个内存的设置,并表明交错可以缓解NoC的拥塞,并允许更高的数据包注入速率。我们还表明,这取决于在更改目标内存节点之前发起者发送的每个事务的数据包数量;这被称为交错步骤。然后我们用几个DMA引擎丰富了设置,这与行业路线图是一致的。我们对多达32个节点的mpsoc和STNoC的各种链路宽度进行了实验。当链路宽度为32字节时,地址交错的总吞吐量增益为20.8%,但当我们将其设置为8字节时,吞吐量增益达到69.64%。这意味着soc中的硅节省,因为并不总是需要配置具有宽链路宽度的noc。
{"title":"Address interleaving for low-cost NoCs","authors":"M. Grammatikakis, Kyprianos Papademetriou, P. Petrakis, M. Coppola, Michael Soulie","doi":"10.1109/ReCoSoC.2016.7533892","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533892","url":null,"abstract":"New generations of NoC-based platforms incorporate address interleaving, which enables balancing transactions between the memory nodes. The memory space is distributed in different nodes of the NoC, and accessed alternately by each on-chip initiator. A memory node is accessed depending on the transaction request address through a memory map. Interleaving can allow for efficient use of NoC bandwidth and congestion reduction, and we study whether its gains scale over system size. In this work we concentrate on an instance of a customizable point-to-point interconnect from STMicroelectronics called STNoC. We first evaluate a setup with 4 CPU initiators and 4 memories, and show that interleaving relieves the NoC from congestion and permits higher packet injection rates. We also show that this depends on the number of packets sent per transaction by an initiator prior to changing destination memory node; this is called interleaving step. We then enriched the setup with several DMA engines, which is in accordance with industry roadmap. We experimented with MPSoCs having up to 32-nodes and for various link-widths of the STNoC. When link-width was 32 Bytes, the aggregate throughput gain from address interleaving was 20.8%, but when we set it 8 Bytes the throughput gain reached 69.64%. This implies silicon savings in SoCs, as it is not always necessary to configure NoCs with wide link-widths.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124100157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconfiguration in FPGA-based multi-core platforms for hard real-time applications 硬实时应用中基于fpga的多核平台的重构
Luca Pezzarossa, Martin Schoeberl, J. Sparsø
In general-purpose computing multi-core platforms, hardware accelerators and reconfiguration are means to improve performance; i.e., the average-case execution time of a software application. In hard real-time systems, such average-case speed-up is not in itself relevant - it is the worst-case execution-time of tasks of an application that determines the systems ability to respond in time. To support this focus, the platform must provide service guarantees for both communication and computation resources. In addition, many hard real-time applications have multiple modes of operation, and each mode has specific requirements. An interesting perspective on reconfigurable computing is to exploit run-time reconfiguration to support mode changes. In this paper we explore approaches to reconfiguration of communication and computation resources in the T-CREST hard real-time multi-core platform. The reconfiguration of communication resources is supported by extending the message-passing network-on-chip with capabilities for setting up, tearing down, and modifying the bandwidth of virtual circuits. The reconfiguration of computation resources, such as hardware accelerators, is performed using the dynamic partial reconfiguration capabilities found in modern FPGAs.
在通用计算多核平台中,硬件加速器和重构是提高性能的手段;例如,软件应用程序的平均执行时间。在硬实时系统中,这种平均情况下的加速本身无关紧要——决定系统及时响应能力的是应用程序任务的最坏情况执行时间。为了支持这一重点,平台必须为通信和计算资源提供服务保障。此外,许多硬实时应用具有多种操作模式,每种模式都有特定的要求。关于可重构计算的一个有趣的观点是利用运行时重构来支持模式更改。本文探讨了T-CREST硬实时多核平台中通信和计算资源重构的方法。通过扩展具有设置、拆除和修改虚拟电路带宽功能的消息传递片上网络来支持通信资源的重新配置。计算资源的重新配置,如硬件加速器,是使用现代fpga中发现的动态部分重新配置能力来执行的。
{"title":"Reconfiguration in FPGA-based multi-core platforms for hard real-time applications","authors":"Luca Pezzarossa, Martin Schoeberl, J. Sparsø","doi":"10.1109/ReCoSoC.2016.7533895","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533895","url":null,"abstract":"In general-purpose computing multi-core platforms, hardware accelerators and reconfiguration are means to improve performance; i.e., the average-case execution time of a software application. In hard real-time systems, such average-case speed-up is not in itself relevant - it is the worst-case execution-time of tasks of an application that determines the systems ability to respond in time. To support this focus, the platform must provide service guarantees for both communication and computation resources. In addition, many hard real-time applications have multiple modes of operation, and each mode has specific requirements. An interesting perspective on reconfigurable computing is to exploit run-time reconfiguration to support mode changes. In this paper we explore approaches to reconfiguration of communication and computation resources in the T-CREST hard real-time multi-core platform. The reconfiguration of communication resources is supported by extending the message-passing network-on-chip with capabilities for setting up, tearing down, and modifying the bandwidth of virtual circuits. The reconfiguration of computation resources, such as hardware accelerators, is performed using the dynamic partial reconfiguration capabilities found in modern FPGAs.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Postponing wearout failures in chip multiprocessors using thermal management and thread migration 利用热管理和线程迁移延缓芯片多处理器的损耗故障
E. Kashefi, H. Zarandi, A. Gordon-Ross
This paper presents an improved method to postpone wearout failures and improves functional unit and entire system lifetime by considering two important wearout factors: temperature and functional unit usage. Our method provides a more fine grained approach as compared to prior methods by considering individual functional unit usage. Using this information, system behavior can be predicted and appropriate thread scheduling and migration decisions can be made. Our method incorporates temperature predictions based on recent historical temperatures and functional unit usages to rank threads and cores in a chip multiprocessor (CMP). Using these rankings, our method migrates threads among cores to reduce thermal hotspots. Simulation results on the ESESC simulator show that our method can improve the average system temperature and lifetime by approximately 4.33°C and 21.65%,respectively,in a tri-core CMP, and 6.4°C and 32% in a quad-core CMP.
本文提出了一种改进的方法,通过考虑两个重要的磨损因素:温度和功能单元的使用情况,来延缓磨损失效,提高功能单元和整个系统的寿命。与以前的方法相比,我们的方法通过考虑单个功能单元的使用情况,提供了一种更细粒度的方法。使用这些信息,可以预测系统行为,并做出适当的线程调度和迁移决策。我们的方法结合了基于最近历史温度和功能单元使用的温度预测,以对芯片多处理器(CMP)中的线程和内核进行排名。使用这些排名,我们的方法在内核之间迁移线程以减少热热点。在ESESC模拟器上的仿真结果表明,该方法可将三核CMP的平均系统温度和寿命分别提高4.33°C和21.65%,将四核CMP的平均系统温度和寿命分别提高6.4°C和32%。
{"title":"Postponing wearout failures in chip multiprocessors using thermal management and thread migration","authors":"E. Kashefi, H. Zarandi, A. Gordon-Ross","doi":"10.1109/ReCoSoC.2016.7533906","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2016.7533906","url":null,"abstract":"This paper presents an improved method to postpone wearout failures and improves functional unit and entire system lifetime by considering two important wearout factors: temperature and functional unit usage. Our method provides a more fine grained approach as compared to prior methods by considering individual functional unit usage. Using this information, system behavior can be predicted and appropriate thread scheduling and migration decisions can be made. Our method incorporates temperature predictions based on recent historical temperatures and functional unit usages to rank threads and cores in a chip multiprocessor (CMP). Using these rankings, our method migrates threads among cores to reduce thermal hotspots. Simulation results on the ESESC simulator show that our method can improve the average system temperature and lifetime by approximately 4.33°C and 21.65%,respectively,in a tri-core CMP, and 6.4°C and 32% in a quad-core CMP.","PeriodicalId":248789,"journal":{"name":"2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123798746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1