首页 > 最新文献

Proceedings of the 9th International Symposium on Networks-on-Chip最新文献

英文 中文
Wear-Aware Adaptive Routing for Networks-on-Chips 片上网络的磨损感知自适应路由
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2786573
A. Vitkovski, V. Soteriou, Paul V. Gratz
Chip-multiprocessors are facing worsening reliability due to prolonged operational stresses, with their tile-interconnecting Network-on-Chip (NoC) being especially vulnerable to wearout-induced failure. To tackle this ominous threat we present a novel wear-aware routing algorithm that continuously considers the stresses the NoC experiences at runtime, along with temperature and fabrication process variation metrics, steering traffic away from locations that are most prone to Electromigration (EM)- and Hot-Carrier Injection (HCI)-induced wear. Under realistic applications our wear-aware algorithm yields 66% and 8% average increases in mean-time-to-failure for EM and HCI, respectively.
由于长时间的工作压力,芯片多处理器面临着可靠性下降的问题,它们的片上网络(NoC)尤其容易受到磨损引起的故障的影响。为了解决这一不利威胁,我们提出了一种新的磨损感知路由算法,该算法持续考虑NoC在运行时所经历的压力,以及温度和制造工艺变化指标,将交通从最容易发生电迁移(EM)和热载流子注入(HCI)引起的磨损的位置引导出来。在实际应用中,我们的磨损感知算法对EM和HCI的平均故障间隔时间分别提高了66%和8%。
{"title":"Wear-Aware Adaptive Routing for Networks-on-Chips","authors":"A. Vitkovski, V. Soteriou, Paul V. Gratz","doi":"10.1145/2786572.2786573","DOIUrl":"https://doi.org/10.1145/2786572.2786573","url":null,"abstract":"Chip-multiprocessors are facing worsening reliability due to prolonged operational stresses, with their tile-interconnecting Network-on-Chip (NoC) being especially vulnerable to wearout-induced failure. To tackle this ominous threat we present a novel wear-aware routing algorithm that continuously considers the stresses the NoC experiences at runtime, along with temperature and fabrication process variation metrics, steering traffic away from locations that are most prone to Electromigration (EM)- and Hot-Carrier Injection (HCI)-induced wear. Under realistic applications our wear-aware algorithm yields 66% and 8% average increases in mean-time-to-failure for EM and HCI, respectively.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114138942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On-Chip Millimeter Wave Antennas and Transceivers 片上毫米波天线和收发器
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2789983
Ofer Markish, O. Katz, B. Sheinman, D. Corcos, D. Elad
The main mechanisms responsible for performance degradation of millimeter wave (mmWave) and terahertz (THz) on-chip antennas are reviewed. Several techniques to improve the performance of the antennas and several high efficiency antenna types are presented. In order to illustrate the effects of the chip topology on the antenna, simulations and measurements of mmWave and THz on-chip antennas are shown. Finally, different transceiver architectures are explored with emphasis on the challenges faced in a wireless multi-core environment.
综述了毫米波(mmWave)和太赫兹(THz)片上天线性能下降的主要机制。介绍了几种提高天线性能的技术和几种高效率的天线类型。为了说明芯片拓扑对天线的影响,给出了毫米波和太赫兹片上天线的仿真和测量结果。最后,探讨了不同的收发器架构,重点讨论了无线多核环境中面临的挑战。
{"title":"On-Chip Millimeter Wave Antennas and Transceivers","authors":"Ofer Markish, O. Katz, B. Sheinman, D. Corcos, D. Elad","doi":"10.1145/2786572.2789983","DOIUrl":"https://doi.org/10.1145/2786572.2789983","url":null,"abstract":"The main mechanisms responsible for performance degradation of millimeter wave (mmWave) and terahertz (THz) on-chip antennas are reviewed. Several techniques to improve the performance of the antennas and several high efficiency antenna types are presented. In order to illustrate the effects of the chip topology on the antenna, simulations and measurements of mmWave and THz on-chip antennas are shown. Finally, different transceiver architectures are explored with emphasis on the challenges faced in a wireless multi-core environment.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114740419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multi-Layer Test and Diagnosis for Dependable NoCs 可靠NoCs的多层检测与诊断
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2788708
H. Wunderlich, M. Radetzki
Networks-on-chip are inherently fault tolerant or at least gracefully degradable as both, connectivity and amount of resources, provide some useful redundancy. These properties can only be exploited extensively if test and diagnosis techniques support fault detection and error containment in an optimized way. On the one hand, all faulty components have to be isolated, and on the other hand, remaining fault-free functionalities have to be kept operational. In this contribution, behavioral end-to-end error detection is considered together with functional test methods for switches and gate level diagnosis to locate and to isolate faults in the network in an efficient way with low time overhead.
片上网络本质上是容错的,或者至少是优雅地可降解的,因为连接性和资源量都提供了一些有用的冗余。只有当测试和诊断技术以优化的方式支持故障检测和错误控制时,才能广泛利用这些特性。一方面,必须隔离所有有故障的组件,另一方面,必须保持剩余的无故障功能的运行。在此贡献中,行为端到端错误检测与交换机和门级诊断的功能测试方法一起考虑,以低时间开销的有效方式定位和隔离网络中的故障。
{"title":"Multi-Layer Test and Diagnosis for Dependable NoCs","authors":"H. Wunderlich, M. Radetzki","doi":"10.1145/2786572.2788708","DOIUrl":"https://doi.org/10.1145/2786572.2788708","url":null,"abstract":"Networks-on-chip are inherently fault tolerant or at least gracefully degradable as both, connectivity and amount of resources, provide some useful redundancy. These properties can only be exploited extensively if test and diagnosis techniques support fault detection and error containment in an optimized way. On the one hand, all faulty components have to be isolated, and on the other hand, remaining fault-free functionalities have to be kept operational. In this contribution, behavioral end-to-end error detection is considered together with functional test methods for switches and gate level diagnosis to locate and to isolate faults in the network in an efficient way with low time overhead.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131304266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Framework for Combining Concurrent Checking and On-Line Embedded Test for Low-Latency Fault Detection in NoC Routers NoC路由器低延迟故障检测并发检测与在线嵌入式测试相结合的框架
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2788713
Pietro Saltarelli, Behrad Niazmand, J. Raik, V. Govind, T. Hollstein, G. Jervan, R. Hariharan
The focus of the paper is detection of faults in NoC routers by combining concurrent checkers with embedded on-line test to enable cost-effective trade-offs between area-overhead and test coverage. First, we propose a framework of tools for formally evaluating the quality of the checkers and for optimizing the overhead area with given fault coverage constraints. The stress is in particular on the minimization of the error detection latency, which is a crucial aspect in order to eliminate (or limit) error propagation. Second, the concurrent checkers will be complemented by embedded on-line test packets which are to be applied as a periodic routine during the idle periods in router operation. The framework together with the corresponding methodology has been successfully applied to a realistic case-study of a fault tolerant NoC router design. The case study shows that combining concurrent routers with embedded test allows reducing the area overhead of the checkers from 31--35% down to 1.5--10% without sacrificing the fault coverage.
本文的重点是通过将并发检查器与嵌入式在线测试相结合来检测NoC路由器中的故障,从而在面积开销和测试覆盖率之间实现经济有效的权衡。首先,我们提出了一个工具框架,用于正式评估检查器的质量,并在给定故障覆盖约束下优化架空区域。重点是最小化错误检测延迟,这是消除(或限制)错误传播的关键方面。其次,并发检查器将由嵌入式在线测试包补充,该测试包将在路由器运行空闲期间作为周期性例程应用。该框架和相应的方法已成功地应用于容错NoC路由器设计的实际案例研究中。案例研究表明,将并发路由器与嵌入式测试相结合,可以在不牺牲故障覆盖率的情况下,将检查器的面积开销从31—35%降低到1.5—10%。
{"title":"A Framework for Combining Concurrent Checking and On-Line Embedded Test for Low-Latency Fault Detection in NoC Routers","authors":"Pietro Saltarelli, Behrad Niazmand, J. Raik, V. Govind, T. Hollstein, G. Jervan, R. Hariharan","doi":"10.1145/2786572.2788713","DOIUrl":"https://doi.org/10.1145/2786572.2788713","url":null,"abstract":"The focus of the paper is detection of faults in NoC routers by combining concurrent checkers with embedded on-line test to enable cost-effective trade-offs between area-overhead and test coverage. First, we propose a framework of tools for formally evaluating the quality of the checkers and for optimizing the overhead area with given fault coverage constraints. The stress is in particular on the minimization of the error detection latency, which is a crucial aspect in order to eliminate (or limit) error propagation. Second, the concurrent checkers will be complemented by embedded on-line test packets which are to be applied as a periodic routine during the idle periods in router operation. The framework together with the corresponding methodology has been successfully applied to a realistic case-study of a fault tolerant NoC router design. The case study shows that combining concurrent routers with embedded test allows reducing the area overhead of the checkers from 31--35% down to 1.5--10% without sacrificing the fault coverage.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125356496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Modeling and Design of High-Radix On-Chip Crossbar Switches 高基数片上交叉开关的建模与设计
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2786579
Cagla Cakir, R. Ho, J. Lexau, K. Mai
The crossbar is a popular topology for on-chip networks that offers non-blocking connectivity and uniform latency. However, as the number of nodes increases, crossbars typically scale poorly in area, power, and latency/throughput. To better understand the design space, we have developed an on-chip crossbar modeling tool based on analytical models calibrated using circuit-level simulation results in 40nm CMOS. We present a design space exploration showing how crossbar area, power, and performance vary across input/output node number, data width, wire parameters, and circuit implementation. Using the modeling results, we identify a design point that demonstrates 2X higher throughput, 1.4X lower power and 1.2X lower area compared to previous published designs.
交叉条是片上网络的一种流行拓扑结构,它提供非阻塞连接和均匀延迟。但是,随着节点数量的增加,交叉栏在面积、功率和延迟/吞吐量方面的可扩展性通常很差。为了更好地理解设计空间,我们开发了一种基于分析模型的片上横杆建模工具,该模型使用40nm CMOS电路级仿真结果进行校准。我们展示了一个设计空间探索,展示了横杆面积、功率和性能如何随输入/输出节点数、数据宽度、导线参数和电路实现而变化。利用建模结果,我们确定了一个设计点,与之前发布的设计相比,该设计点的吞吐量提高了2倍,功耗降低了1.4倍,面积降低了1.2倍。
{"title":"Modeling and Design of High-Radix On-Chip Crossbar Switches","authors":"Cagla Cakir, R. Ho, J. Lexau, K. Mai","doi":"10.1145/2786572.2786579","DOIUrl":"https://doi.org/10.1145/2786572.2786579","url":null,"abstract":"The crossbar is a popular topology for on-chip networks that offers non-blocking connectivity and uniform latency. However, as the number of nodes increases, crossbars typically scale poorly in area, power, and latency/throughput. To better understand the design space, we have developed an on-chip crossbar modeling tool based on analytical models calibrated using circuit-level simulation results in 40nm CMOS. We present a design space exploration showing how crossbar area, power, and performance vary across input/output node number, data width, wire parameters, and circuit implementation. Using the modeling results, we identify a design point that demonstrates 2X higher throughput, 1.4X lower power and 1.2X lower area compared to previous published designs.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124230585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Networking Challenges and Prospective Impact of Broadcast-Oriented Wireless Networks-on-Chip 面向广播的无线片上网络的网络挑战和未来影响
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2788710
S. Abadal, M. Nemirovsky, E. Alarcón, A. Cabellos-Aparicio
The cost of broadcast has been constraining the design of manycore processors and of the algorithms that run upon them. However, as on-chip RF technologies allow the design of small-footprint and high-bandwidth antennas and transceivers, native low-latency (a few clock cycles) and low-power (a few pJ/bit) broadcast support through wireless communication can be envisaged. In this paper, we analyze the main networking design aspects and challenges of Broadcast-oriented Wireless Network-on-Chip (BoWNoC), which are basically reduced to the development of Medium Access Control (MAC) protocols able to handle hundreds of cores. We evaluate the broadcast performance and scalability of different MAC designs, to then discuss the impact that the proposed paradigm could exert on the performance, scalability and programmability of future manycore architectures, programming models and parallel algorithms.
广播的成本一直限制着多核处理器的设计和在其上运行的算法。然而,由于片上射频技术允许设计小占用空间和高带宽的天线和收发器,可以设想通过无线通信实现本地低延迟(几个时钟周期)和低功耗(几个pJ/bit)广播支持。在本文中,我们分析了面向广播的无线片上网络(BoWNoC)的主要网络设计方面和挑战,基本归结为能够处理数百核的介质访问控制(MAC)协议的开发。我们评估了不同MAC设计的广播性能和可扩展性,然后讨论了所提出的范式对未来多核架构、编程模型和并行算法的性能、可扩展性和可编程性的影响。
{"title":"Networking Challenges and Prospective Impact of Broadcast-Oriented Wireless Networks-on-Chip","authors":"S. Abadal, M. Nemirovsky, E. Alarcón, A. Cabellos-Aparicio","doi":"10.1145/2786572.2788710","DOIUrl":"https://doi.org/10.1145/2786572.2788710","url":null,"abstract":"The cost of broadcast has been constraining the design of manycore processors and of the algorithms that run upon them. However, as on-chip RF technologies allow the design of small-footprint and high-bandwidth antennas and transceivers, native low-latency (a few clock cycles) and low-power (a few pJ/bit) broadcast support through wireless communication can be envisaged. In this paper, we analyze the main networking design aspects and challenges of Broadcast-oriented Wireless Network-on-Chip (BoWNoC), which are basically reduced to the development of Medium Access Control (MAC) protocols able to handle hundreds of cores. We evaluate the broadcast performance and scalability of different MAC designs, to then discuss the impact that the proposed paradigm could exert on the performance, scalability and programmability of future manycore architectures, programming models and parallel algorithms.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134216379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Improving DVFS in NoCs with Coherence Prediction 相干预测改善NoCs的DVFS
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2786595
R. Hesse, Natalie D. Enright Jerger
As Networks-on-Chip (NoCs) continue to consume a large fraction of the total chip power budget, dynamic voltage and frequency scaling (DVFS) has evolved into an integral part of NoC designs. Efficient DVFS relies on accurate predictions of future network state. Most previous approaches are reactive and based on network-centric metrics, such as buffer occupation and channel utilization. However, we find that there is little correlation between those metrics and subsequent NoC traffic, which leads to suboptimal DVFS decisions. In this work, we propose to utilize highly predictable properties of cache-coherence communication to derive more specific and reliable NoC traffic predictions. A DVFS mechanism based on our traffic predictions, reduces power by 41% compared to a baseline without DVFS and by 21% on average when compared to a state-of-the-art DVFS implementation, while only degrading performance by 3%.
随着片上网络(NoC)继续消耗芯片总功耗预算的很大一部分,动态电压和频率缩放(DVFS)已经发展成为NoC设计的一个组成部分。高效的DVFS依赖于对未来网络状态的准确预测。以前的大多数方法都是响应式的,并且基于以网络为中心的指标,例如缓冲区占用和通道利用率。然而,我们发现这些指标与随后的NoC流量之间几乎没有相关性,这导致了次优的DVFS决策。在这项工作中,我们建议利用缓存相干通信的高度可预测特性来获得更具体和可靠的NoC流量预测。基于流量预测的DVFS机制,与没有DVFS的基线相比,功耗降低41%,与最先进的DVFS实现相比,功耗平均降低21%,而性能仅降低3%。
{"title":"Improving DVFS in NoCs with Coherence Prediction","authors":"R. Hesse, Natalie D. Enright Jerger","doi":"10.1145/2786572.2786595","DOIUrl":"https://doi.org/10.1145/2786572.2786595","url":null,"abstract":"As Networks-on-Chip (NoCs) continue to consume a large fraction of the total chip power budget, dynamic voltage and frequency scaling (DVFS) has evolved into an integral part of NoC designs. Efficient DVFS relies on accurate predictions of future network state. Most previous approaches are reactive and based on network-centric metrics, such as buffer occupation and channel utilization. However, we find that there is little correlation between those metrics and subsequent NoC traffic, which leads to suboptimal DVFS decisions. In this work, we propose to utilize highly predictable properties of cache-coherence communication to derive more specific and reliable NoC traffic predictions. A DVFS mechanism based on our traffic predictions, reduces power by 41% compared to a baseline without DVFS and by 21% on average when compared to a state-of-the-art DVFS implementation, while only degrading performance by 3%.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"537 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120979947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
User Cooperation Network Coding Approach for NoC Performance Improvement 面向NoC性能改进的用户协作网络编码方法
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2786575
Yuankun Xue, P. Bogdan
The astonishing rate of sensing modalities and data generation poses a tremendous impact on computing platforms for providing real-time mining and prediction capabilities. We are capable of monitoring thousands of genes and their interactions, but we lack efficient computing platforms for large-scale (exa-scale) data processing. Towards this end, we propose a novel hierarchical Network-on-Chip (NoC) architecture that exploits user-cooperated network coding (NC) concepts for improving system throughput. Our proposed architecture relies on a light-weighted subnet of cooperation unit routers (CUR) for multicast traffic. Coding network interface (CNI) performs encoding/decoding of NC symbols and shares the data flows among cooperation units(CUs). We endow our proposed NC-based NoC architecture with: (i) a corridor routing algorithm (CRA) for maximizing network throughput and (ii) an adaptive flit dropping (AFD) scheme to mitigate congestion, branch-blocking and deadlock at run-time. The experimental results demonstrate that our proposed platform offers up to 127X multicast throughput improvement over multiple-unicast and XY tree-based multicast under synthetic collective traffic scenario. We have evaluated the proposed platform with different realworld benchmarks under network sizes of 4x4 to 32x32. Simulation results show 21%--91% latency improvement and up to 25X runtime reduction over conventional mesh NoC performing genetic-algorithm based protein folding analysis. FPGA implementation results show minimal overhead.
传感模式和数据生成的惊人速度对提供实时挖掘和预测能力的计算平台产生了巨大影响。我们有能力监测成千上万的基因及其相互作用,但我们缺乏有效的计算平台来处理大规模(超大规模)的数据。为此,我们提出了一种新的分层片上网络(NoC)架构,该架构利用用户协作网络编码(NC)概念来提高系统吞吐量。我们提出的架构依赖于一个轻量级的合作单元路由器子网(CUR)来处理多播流量。编码网络接口(CNI)负责NC符号的编码/解码,并在协作单元(cu)之间共享数据流。我们赋予我们提出的基于nc的NoC架构:(i)走廊路由算法(CRA)以最大化网络吞吐量;(ii)自适应飞降(AFD)方案以减轻运行时的拥塞、分支阻塞和死锁。实验结果表明,在综合集流场景下,与多播单播和基于XY树的组播相比,该平台的组播吞吐量提高了127X。我们在4x4到32x32的网络大小下,用不同的实际基准测试评估了提议的平台。仿真结果表明,与传统网格NoC相比,基于遗传算法的蛋白质折叠分析延迟提高了21%- 91%,运行时间减少了25倍。FPGA实现结果显示最小的开销。
{"title":"User Cooperation Network Coding Approach for NoC Performance Improvement","authors":"Yuankun Xue, P. Bogdan","doi":"10.1145/2786572.2786575","DOIUrl":"https://doi.org/10.1145/2786572.2786575","url":null,"abstract":"The astonishing rate of sensing modalities and data generation poses a tremendous impact on computing platforms for providing real-time mining and prediction capabilities. We are capable of monitoring thousands of genes and their interactions, but we lack efficient computing platforms for large-scale (exa-scale) data processing. Towards this end, we propose a novel hierarchical Network-on-Chip (NoC) architecture that exploits user-cooperated network coding (NC) concepts for improving system throughput. Our proposed architecture relies on a light-weighted subnet of cooperation unit routers (CUR) for multicast traffic. Coding network interface (CNI) performs encoding/decoding of NC symbols and shares the data flows among cooperation units(CUs). We endow our proposed NC-based NoC architecture with: (i) a corridor routing algorithm (CRA) for maximizing network throughput and (ii) an adaptive flit dropping (AFD) scheme to mitigate congestion, branch-blocking and deadlock at run-time. The experimental results demonstrate that our proposed platform offers up to 127X multicast throughput improvement over multiple-unicast and XY tree-based multicast under synthetic collective traffic scenario. We have evaluated the proposed platform with different realworld benchmarks under network sizes of 4x4 to 32x32. Simulation results show 21%--91% latency improvement and up to 25X runtime reduction over conventional mesh NoC performing genetic-algorithm based protein folding analysis. FPGA implementation results show minimal overhead.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121774687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Highway in TDM NoCs TDM NoCs中的高速公路
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2786577
Shaoteng Liu, Zhonghai Lu, A. Jantsch
TDM (Time Division Multiplexing) is a well-known technique to provide QoS guarantees in NoCs. However, unused time slots commonly exist in TDM NoCs. In the paper, we propose a TDM highway technique which can enhance the slot utilization of TDM NoCs. A TDM highway is an express TDM connection composed of special buffer queues, called highway channels (HWCs). It can enhance the throughput and reduce data transfer delay of the connection, while keeping the quality of service (QoS) guarantee on minimum bandwidth and in-order packet delivery. We have developed a dynamic and repetitive highway setup policy which has no dependency on particular TDM NoC techniques and no overhead on traffic flows. As a result, highways can be efficiently established and utilized in various TDM NoCs. According to our experiments, compared to a traditional TDM NoC, adding one HWC with two buffers to every input port of routers in an 8×8 mesh can reduce data delay by up to 80% and increase the maximum throughput by up to 310%. More improvements can be achieved by adding more HWCs per input per router, or more buffers per HWC. We also use a set of MPSoC application benchmarks to evaluate our highway technique. The experiment results suggest that with highway, we can reduce application run time up to 51%.
时分多路复用技术(TDM)是一种在noc中提供QoS保证的知名技术。然而,TDM noc中通常存在未使用的时隙。本文提出了一种时分复用高速公路技术,可以提高时分复用noc的时隙利用率。TDM高速公路是由特殊缓冲队列组成的快速TDM连接,称为高速公路通道(HWCs)。它可以提高吞吐量,减少连接的数据传输延迟,同时保证最小带宽和有序分组的服务质量(QoS)。我们已经开发了一种动态和重复的高速公路设置策略,它不依赖于特定的TDM NoC技术,也不会对交通流量产生开销。因此,高速公路可以有效地建立和利用在各个TDM noc。根据我们的实验,与传统的TDM NoC相比,在8×8网格中路由器的每个输入端口添加一个带两个缓冲区的HWC可以减少高达80%的数据延迟,并增加高达310%的最大吞吐量。通过为每个路由器的每个输入增加更多的HWC,或者为每个HWC增加更多的缓冲区,可以实现更多的改进。我们还使用一组MPSoC应用基准来评估我们的高速公路技术。实验结果表明,在高速公路上,我们可以将应用程序的运行时间减少51%。
{"title":"Highway in TDM NoCs","authors":"Shaoteng Liu, Zhonghai Lu, A. Jantsch","doi":"10.1145/2786572.2786577","DOIUrl":"https://doi.org/10.1145/2786572.2786577","url":null,"abstract":"TDM (Time Division Multiplexing) is a well-known technique to provide QoS guarantees in NoCs. However, unused time slots commonly exist in TDM NoCs. In the paper, we propose a TDM highway technique which can enhance the slot utilization of TDM NoCs. A TDM highway is an express TDM connection composed of special buffer queues, called highway channels (HWCs). It can enhance the throughput and reduce data transfer delay of the connection, while keeping the quality of service (QoS) guarantee on minimum bandwidth and in-order packet delivery. We have developed a dynamic and repetitive highway setup policy which has no dependency on particular TDM NoC techniques and no overhead on traffic flows. As a result, highways can be efficiently established and utilized in various TDM NoCs. According to our experiments, compared to a traditional TDM NoC, adding one HWC with two buffers to every input port of routers in an 8×8 mesh can reduce data delay by up to 80% and increase the maximum throughput by up to 310%. More improvements can be achieved by adding more HWCs per input per router, or more buffers per HWC. We also use a set of MPSoC application benchmarks to evaluate our highway technique. The experiment results suggest that with highway, we can reduce application run time up to 51%.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126933396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Dark Silicon: From Computation to Communication 暗硅:从计算到通信
Pub Date : 2015-09-28 DOI: 10.1145/2786572.2788707
J. Henkel, H. Bokhari, S. Garg, M. U. Khan, Heba Khdr, F. Kriebel, Ümit Y. Ogras, S. Parameswaran, M. Shafique
In the emerging Dark Silicon era, not all parts of an on-chip system (i.e., cores, Network-on-Chip, and memory resources) can be simultaneously powered-on at the full speed. This paper aims at exposing dark silicon challenges to the NOCS community with an overview of some of the early research efforts that are attempting to shape the design and run-time management of future generation heterogeneous dark silicon processors. The goal is to cover both the computation and communication perspectives. In particular, we exploit computation and communication heterogeneity at multiple levels of system abstractions to design and manage dark silicon processors. The available dark silicon is leveraged to improve power/energy, performance, and reliability efficiency.
在新兴的暗硅时代,并非片上系统的所有部分(即核心、片上网络和内存资源)都可以同时全速开机。本文旨在通过概述一些早期的研究工作,揭示暗硅对NOCS社区的挑战,这些研究工作试图塑造下一代异构暗硅处理器的设计和运行时管理。我们的目标是同时涵盖计算和通信两个方面。特别是,我们在系统抽象的多个层次上利用计算和通信的异质性来设计和管理暗硅处理器。利用现有的暗硅来提高功率/能量、性能和可靠性效率。
{"title":"Dark Silicon: From Computation to Communication","authors":"J. Henkel, H. Bokhari, S. Garg, M. U. Khan, Heba Khdr, F. Kriebel, Ümit Y. Ogras, S. Parameswaran, M. Shafique","doi":"10.1145/2786572.2788707","DOIUrl":"https://doi.org/10.1145/2786572.2788707","url":null,"abstract":"In the emerging Dark Silicon era, not all parts of an on-chip system (i.e., cores, Network-on-Chip, and memory resources) can be simultaneously powered-on at the full speed. This paper aims at exposing dark silicon challenges to the NOCS community with an overview of some of the early research efforts that are attempting to shape the design and run-time management of future generation heterogeneous dark silicon processors. The goal is to cover both the computation and communication perspectives. In particular, we exploit computation and communication heterogeneity at multiple levels of system abstractions to design and manage dark silicon processors. The available dark silicon is leveraged to improve power/energy, performance, and reliability efficiency.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127938821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Proceedings of the 9th International Symposium on Networks-on-Chip
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1