2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip最新文献

Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels 基于双向信道的片上网络的细粒度带宽自适应

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.23

R. Hesse, J. Nicholls, Natalie D. Enright Jerger

Networks-on-Chip (NoC) serve as efficient and scalable communication substrates for many-core architectures. Currently, the bandwidth provided in NoCs is over provisioned for their typical usage case. In real-world multi-core applications, less than 5% of channels are utilized on average. Large bandwidth resources serve to keep network latency low during periods of peak communication demands. Increasing the average channel utilization through narrower channels could improve the efficiency of NoCs in terms of area and power, however, in current NoC architectures this degrades overall system performance. Based on thorough analysis of the dynamic behaviour of real workloads, we design a novel NoC architecture that adapts to changing application demands. Our architecture uses fine-grained bandwidth-adaptive bidirectional channels to improve channel utilization without negatively affecting network latency. Running PARSEC benchmarks on a cycle-accurate full-system simulator, we show that fine-grained bandwidth adaptivity can save up to 75% of channel resources while achieving 92% of overall system performance compared to the baseline network, no performance is sacrificed in our network design configured with 50% of the channel resources used in the baseline.

片上网络(NoC)是多核架构中高效、可扩展的通信基板。目前，noc提供的带宽对于其典型使用情况来说是过度供应的。在实际的多核应用程序中，平均只有不到5%的通道被利用。大带宽资源可以在通信需求高峰期保持较低的网络延迟。通过更窄的信道增加平均信道利用率可以提高NoC在面积和功率方面的效率，然而，在当前的NoC架构中，这会降低整体系统性能。基于对实际工作负载动态行为的深入分析，我们设计了一种新的NoC架构，以适应不断变化的应用程序需求。我们的架构使用细粒度的带宽自适应双向通道来提高通道利用率，而不会对网络延迟产生负面影响。在周期精确的全系统模拟器上运行PARSEC基准测试，我们发现，与基线网络相比，细粒度带宽自适应可以节省高达75%的信道资源，同时实现92%的整体系统性能，在我们的网络设计中配置了基线中使用的50%的信道资源，没有性能牺牲。

{"title":"Fine-Grained Bandwidth Adaptivity in Networks-on-Chip Using Bidirectional Channels","authors":"R. Hesse, J. Nicholls, Natalie D. Enright Jerger","doi":"10.1109/NOCS.2012.23","DOIUrl":"https://doi.org/10.1109/NOCS.2012.23","url":null,"abstract":"Networks-on-Chip (NoC) serve as efficient and scalable communication substrates for many-core architectures. Currently, the bandwidth provided in NoCs is over provisioned for their typical usage case. In real-world multi-core applications, less than 5% of channels are utilized on average. Large bandwidth resources serve to keep network latency low during periods of peak communication demands. Increasing the average channel utilization through narrower channels could improve the efficiency of NoCs in terms of area and power, however, in current NoC architectures this degrades overall system performance. Based on thorough analysis of the dynamic behaviour of real workloads, we design a novel NoC architecture that adapts to changing application demands. Our architecture uses fine-grained bandwidth-adaptive bidirectional channels to improve channel utilization without negatively affecting network latency. Running PARSEC benchmarks on a cycle-accurate full-system simulator, we show that fine-grained bandwidth adaptivity can save up to 75% of channel resources while achieving 92% of overall system performance compared to the baseline network, no performance is sacrificed in our network design configured with 50% of the channel resources used in the baseline.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"18 1","pages":"132-141"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73663151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD:最小缓冲偏转路由节能互连

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.8

Chris Fallin, Greg Nazario, Xiangyao Yu, K. Chang, Rachata Ausavarungnirun, O. Mutlu

A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These buffers improve performance, but consume significant power. It is possible to bypass these buffers when they are empty, reducing dynamic power, but static buffer power, and dynamic power when buffers are utilized, remain. To improve energy efficiency, buffer less deflection routing removes input buffers, and instead uses deflection (misrouting) to resolve contention. However, at high network load, deflections cause unnecessary network hops, wasting power and reducing performance. In this work, we propose a new NoC router design called the minimally-buffered deflection (MinBD) router. This router combines deflection routing with a small "side buffer," which is much smaller than conventional input buffers. A MinBD router places some network traffic that would have otherwise been deflected in this side buffer, reducing deflections significantly. The router buffers only a fraction of traffic, thus making more efficient use of buffer space than a router that holds every flit in its input buffers. We evaluate MinBD against input-buffered routers of various sizes that implement buffer bypassing, a buffer less router, and a hybrid design, and show that MinBD is more energy efficient than all prior designs, and has performance that approaches the conventional input-buffered router with area and power close to the buffer less router.

传统的片上网络(NoC)路由器使用输入缓冲区来存储飞行中的数据包。这些缓冲区提高了性能，但消耗了大量的功率。当缓冲区为空时，可以绕过这些缓冲区，从而减少动态功率，但是静态缓冲区功率和使用缓冲区时的动态功率仍然存在。为了提高能源效率，缓冲少偏转路由删除输入缓冲区，而是使用偏转(误路由)来解决争用。但是，在高网络负载情况下，偏转会导致不必要的网络跳转，浪费电力，降低性能。在这项工作中，我们提出了一种新的NoC路由器设计，称为最小缓冲偏转(MinBD)路由器。这种路由器结合了偏转路由和一个小的“侧缓冲器”，它比传统的输入缓冲器小得多。MinBD路由器将一些原本会在这个侧缓冲区中偏转的网络流量放置在此，从而显着减少偏转。路由器只缓冲一小部分流量，因此比在其输入缓冲区中保存每个flit的路由器更有效地利用缓冲空间。我们将MinBD与实现缓冲区旁路、无缓冲区路由器和混合设计的各种大小的输入缓冲路由器进行了评估，并表明MinBD比所有先前的设计更节能，并且具有接近传统输入缓冲路由器的性能，面积和功率接近无缓冲区路由器。

{"title":"MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect","authors":"Chris Fallin, Greg Nazario, Xiangyao Yu, K. Chang, Rachata Ausavarungnirun, O. Mutlu","doi":"10.1109/NOCS.2012.8","DOIUrl":"https://doi.org/10.1109/NOCS.2012.8","url":null,"abstract":"A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These buffers improve performance, but consume significant power. It is possible to bypass these buffers when they are empty, reducing dynamic power, but static buffer power, and dynamic power when buffers are utilized, remain. To improve energy efficiency, buffer less deflection routing removes input buffers, and instead uses deflection (misrouting) to resolve contention. However, at high network load, deflections cause unnecessary network hops, wasting power and reducing performance. In this work, we propose a new NoC router design called the minimally-buffered deflection (MinBD) router. This router combines deflection routing with a small \"side buffer,\" which is much smaller than conventional input buffers. A MinBD router places some network traffic that would have otherwise been deflected in this side buffer, reducing deflections significantly. The router buffers only a fraction of traffic, thus making more efficient use of buffer space than a router that holds every flit in its input buffers. We evaluate MinBD against input-buffered routers of various sizes that implement buffer bypassing, a buffer less router, and a hybrid design, and show that MinBD is more energy efficient than all prior designs, and has performance that approaches the conventional input-buffered router with area and power close to the buffer less router.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"49 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80986436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 128

A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects 片上互连的STT-MRAM混合缓冲设计

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.30

Hyunjun Jang, Baik Song An, Nikhil Kulkarni, K. H. Yum, Eun Jung Kim

As the chip multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) has been a major bottleneck in CMP systems. Using high-density memories in input buffers helps to reduce the bottleneck through increasing throughput. Spin-Torque Transfer Magnetic RAM (STT-MRAM) can be a suitable solution due to its nature of high density and near-zero leakage power. But its long latency and high power consumption in write operations still need to be addressed. We explore the design issues in using STT-MRAM for NoC input buffers. Motivated by short intra-router latency, we use the previously proposed write latency reduction technique sacrificing retention time. Then we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Simulation results show that the proposed scheme enhances the throughput by 21% on average.

随着芯片多处理器(CMP)设计向多核架构发展，片上网络(Network-on-Chip, NoC)的通信延迟问题已成为CMP系统的主要瓶颈。在输入缓冲区中使用高密度存储器有助于通过增加吞吐量来减少瓶颈。自旋转矩传递磁性RAM (STT-MRAM)具有高密度和接近零泄漏功率的特性，是一种合适的解决方案。但是它的长时延和高功耗的写操作仍然需要解决。我们探讨了在NoC输入缓冲器中使用STT-MRAM的设计问题。由于路由器内延迟短，我们使用先前提出的写延迟减少技术，牺牲了保留时间。然后，我们提出了一种使用SRAM和STT-MRAM的混合输入缓冲器设计，以有效地隐藏长写延迟。考虑到混合缓冲中的简单数据迁移比SRAM消耗更多的动态功率，我们提供了一种延迟迁移方案，以降低混合缓冲的动态功耗。仿真结果表明，该方案平均提高了21%的吞吐量。

{"title":"A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects","authors":"Hyunjun Jang, Baik Song An, Nikhil Kulkarni, K. H. Yum, Eun Jung Kim","doi":"10.1109/NOCS.2012.30","DOIUrl":"https://doi.org/10.1109/NOCS.2012.30","url":null,"abstract":"As the chip multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) has been a major bottleneck in CMP systems. Using high-density memories in input buffers helps to reduce the bottleneck through increasing throughput. Spin-Torque Transfer Magnetic RAM (STT-MRAM) can be a suitable solution due to its nature of high density and near-zero leakage power. But its long latency and high power consumption in write operations still need to be addressed. We explore the design issues in using STT-MRAM for NoC input buffers. Motivated by short intra-router latency, we use the previously proposed write latency reduction technique sacrificing retention time. Then we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Simulation results show that the proposed scheme enhances the throughput by 21% on average.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"13 1","pages":"193-200"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78300415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip 高端多处理器片上系统的瞬态和永久错误控制

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.27

Qiaoyan Yu, José Cano, J. Flich, P. Ampadu

High-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, we propose a combined method to address permanent and transient link and router failures in those systems. The LBDRhr mechanism is proposed to tolerate permanent link failures in some popular high-radix topologies. The increased router complexity may lead to more transient router errors than routers using simple XY routing algorithm. We exploit the inherent information redundancy (IIR) in LBDRhr logic to manage transient errors in the network routers. Thorough analyses are provided to discover the appropriate internal nodes and the forbidden signal patterns for transient error detection. Simulation results show that LBDRhr logic can tolerate all of the permanent failure combinations of long-range links and 80% of links failures at short-range links. Case studies show that the error detection method based on the new IIR extraction method reduces the power consumption and the residual error rate by 33% and up to two orders of magnitude, respectively, compared to triple modular redundancy. The impact of network topologies on the efficiency of the detection mechanism has been examined in this work, as well.

具有内置高基数拓扑的高端MPSoC系统由于连接性的改善和网络直径的减小而获得了良好的性能。在高端MPSoC系统中，容错支持正成为一项强制性功能。在这项工作中，我们提出了一种组合方法来解决这些系统中的永久和瞬态链路和路由器故障。提出了LBDRhr机制来容忍一些流行的高基数拓扑中的永久链路故障。与使用简单XY路由算法的路由器相比，路由器复杂性的增加可能会导致更多的路由器瞬态错误。我们利用LBDRhr逻辑中的固有信息冗余(IIR)来管理网络路由器中的瞬态错误。通过深入的分析，找出合适的内部节点和用于瞬态错误检测的禁止信号模式。仿真结果表明，LBDRhr逻辑可以容忍远程链路的所有永久故障组合和80%的短程链路故障组合。实例研究表明，与三模冗余相比，基于新的IIR提取方法的误差检测方法的功耗和剩余错误率分别降低了33%和两个数量级。网络拓扑结构对检测机制效率的影响也在这项工作中进行了研究。

{"title":"Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip","authors":"Qiaoyan Yu, José Cano, J. Flich, P. Ampadu","doi":"10.1109/NOCS.2012.27","DOIUrl":"https://doi.org/10.1109/NOCS.2012.27","url":null,"abstract":"High-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, we propose a combined method to address permanent and transient link and router failures in those systems. The LBDRhr mechanism is proposed to tolerate permanent link failures in some popular high-radix topologies. The increased router complexity may lead to more transient router errors than routers using simple XY routing algorithm. We exploit the inherent information redundancy (IIR) in LBDRhr logic to manage transient errors in the network routers. Thorough analyses are provided to discover the appropriate internal nodes and the forbidden signal patterns for transient error detection. Simulation results show that LBDRhr logic can tolerate all of the permanent failure combinations of long-range links and 80% of links failures at short-range links. Case studies show that the error detection method based on the new IIR extraction method reduces the power consumption and the residual error rate by 33% and up to two orders of magnitude, respectively, compared to triple modular redundancy. The impact of network topologies on the efficiency of the detection mechanism has been examined in this work, as well.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"35 1","pages":"169-176"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87985321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC:专为缓存一致服务器的能源效率的片上互连

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.15

Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, B. Falsafi, G. Micheli

Many core chips are emerging as the architecture of choice to provide power efficiency and improve performance, while riding Moore's Law. In these architectures, on-chip inter-connects play a pivotal role in ensuring power and performance scalability. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular will require specialization to meet power and performance objectives. In this work, we make the observation that cache-coherent many core server chips exhibit a duality in on-chip network traffic. Request traffic largely consists of simple control messages, while response traffic often carries cache-block-sized payloads. We present Cache-Coherence Network-on-Chip (CCNoC), a design that specializes the NoC to fit the demands of server workloads via a pair of asymmetric networks tuned to the type of traffic traversing them. The networks differ in their data path width, router micro architecture, flow control strategy, and delay. The resulting heterogeneous CCNoC architecture enables significant gains in power efficiency over conventional NoC designs at similar performance levels. Our evaluation reveals that a 4×4 mesh-based chip multiprocessor with the proposed CCNoC organization running commercial server workloads is 15-28% more energy efficient than various state-of-the-art single- and dual-network organizations.

在摩尔定律的指引下，许多核心芯片正在成为提供能效和提高性能的首选架构。在这些架构中，片上互连在确保功率和性能可扩展性方面起着关键作用。随着未来技术中电源电压开始趋于平稳，芯片设计，特别是互连将需要专业化以满足功率和性能目标。在这项工作中，我们观察到缓存一致的多核心服务器芯片在片上网络流量中表现出双重性。请求流量主要由简单的控制消息组成，而响应流量通常携带缓存块大小的有效负载。我们提出了缓存一致性片上网络(CCNoC)，这是一种专门设计NoC的设计，通过一对非对称网络调整到穿越它们的流量类型，以适应服务器工作负载的需求。这些网络在数据路径宽度、路由器微架构、流量控制策略和延迟方面有所不同。由此产生的异构CCNoC架构在类似性能水平下，可以比传统的NoC设计显著提高功率效率。我们的评估表明，与各种最先进的单网络和双网络组织相比，基于4×4网格的芯片多处理器与拟议的CCNoC组织运行商业服务器工作负载的能效要高15-28%。

{"title":"CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers","authors":"Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, B. Falsafi, G. Micheli","doi":"10.1109/NOCS.2012.15","DOIUrl":"https://doi.org/10.1109/NOCS.2012.15","url":null,"abstract":"Many core chips are emerging as the architecture of choice to provide power efficiency and improve performance, while riding Moore's Law. In these architectures, on-chip inter-connects play a pivotal role in ensuring power and performance scalability. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular will require specialization to meet power and performance objectives. In this work, we make the observation that cache-coherent many core server chips exhibit a duality in on-chip network traffic. Request traffic largely consists of simple control messages, while response traffic often carries cache-block-sized payloads. We present Cache-Coherence Network-on-Chip (CCNoC), a design that specializes the NoC to fit the demands of server workloads via a pair of asymmetric networks tuned to the type of traffic traversing them. The networks differ in their data path width, router micro architecture, flow control strategy, and delay. The resulting heterogeneous CCNoC architecture enables significant gains in power efficiency over conventional NoC designs at similar performance levels. Our evaluation reveals that a 4×4 mesh-based chip multiprocessor with the proposed CCNoC organization running commercial server workloads is 15-28% more energy efficient than various state-of-the-art single- and dual-network organizations.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"10 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91147027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

An Optimal Control Approach to Power Management for Multi-Voltage and Frequency Islands Multiprocessor Platforms under Highly Variable Workloads 高可变负载下多电压频岛多处理器平台电源管理的最优控制方法

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.32

P. Bogdan, R. Marculescu, Siddhartha Jain, Rafael Tornero Gavilá

Reducing energy consumption in multi-processor systems-on-chip (MPSoCs) where communication happens via the network-on-chip (NoC) approach calls for multiple voltage/frequency island (VFI)-based designs. In turn, such multi-VFI architectures need efficient, robust, and accurate run-time control mechanisms that can exploit the workload characteristics in order to save power. Despite being tractable, the linear control models for power management cannot capture some important workload characteristics (e.g., fractality, non-stationarity) observed in heterogeneous NoCs, if ignored, such characteristics lead to inefficient communication and resources allocation, as well as high power dissipation in MPSoCs. To mitigate such limitations, we propose a new paradigm shift from power optimization based on linear models to control approaches based on fractal-state equations. As such, our approach is the first to propose a controller for fractal workloads with precise constraints on state and control variables and specific time bounds. Our results show that significant power savings (about 70%) can be achieved at run-time while running a variety of benchmark applications.

在通过片上网络(NoC)进行通信的多处理器片上系统(mpsoc)中，为了降低能耗，需要采用基于多个电压/频率岛(VFI)的设计。反过来，这种多vfi架构需要高效、健壮和准确的运行时控制机制，这些机制可以利用工作负载特性来节省电力。尽管易于处理，但用于电源管理的线性控制模型无法捕捉到异构noc中观察到的一些重要工作负载特征(例如分形、非平稳性)，如果忽略这些特征，则会导致mpsoc中的低效通信和资源分配，以及高功耗。为了减轻这些限制，我们提出了一种新的范式转变，从基于线性模型的功率优化到基于分形状态方程的控制方法。因此，我们的方法是第一个提出分形工作负载的控制器，具有对状态和控制变量以及特定时间界限的精确约束。我们的结果表明，在运行各种基准测试应用程序时，可以实现显著的节能(约70%)。

{"title":"An Optimal Control Approach to Power Management for Multi-Voltage and Frequency Islands Multiprocessor Platforms under Highly Variable Workloads","authors":"P. Bogdan, R. Marculescu, Siddhartha Jain, Rafael Tornero Gavilá","doi":"10.1109/NOCS.2012.32","DOIUrl":"https://doi.org/10.1109/NOCS.2012.32","url":null,"abstract":"Reducing energy consumption in multi-processor systems-on-chip (MPSoCs) where communication happens via the network-on-chip (NoC) approach calls for multiple voltage/frequency island (VFI)-based designs. In turn, such multi-VFI architectures need efficient, robust, and accurate run-time control mechanisms that can exploit the workload characteristics in order to save power. Despite being tractable, the linear control models for power management cannot capture some important workload characteristics (e.g., fractality, non-stationarity) observed in heterogeneous NoCs, if ignored, such characteristics lead to inefficient communication and resources allocation, as well as high power dissipation in MPSoCs. To mitigate such limitations, we propose a new paradigm shift from power optimization based on linear models to control approaches based on fractal-state equations. As such, our approach is the first to propose a controller for fractal workloads with precise constraints on state and control variables and specific time bounds. Our results show that significant power savings (about 70%) can be achieved at run-time while running a variety of benchmark applications.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"14 1","pages":"35-42"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76164800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

A Novel Flit Serialization Strategy to Utilize Partially Faulty Links in Networks-on-Chip 一种利用片上网络部分故障链路的Flit串行化策略

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.22

Changlin Chen, Ye Lu, S. Cotofana

Aggressive MOS transistor size scaling substantially increase the probability of faults in NoC links due to manufacturing defects, process variations, and chip wire-out effects. Strategies have been proposed to tolerate faulty wires by replacing them with spare ones or by partially using the defective links. However, these strategies either suffer from high area and power overheads, or significantly increase the average network latency. In this paper, we propose a novel flit serialization method, which divides the links and flits into several sections, and serializes flit sections of adjacent flits to transmit them on all available fault-free link sections to avoid the complete waste of defective links bandwidth. Experimental results indicate that our method reduces the latency overhead significantly and enables graceful performance degradation, when compared with related partially faulty link usage proposals, and saves area and power overheads by up to 29% and 43.1%, respectively, when compared with spare wire replacement methods.

激进的MOS晶体管尺寸缩放大大增加了由于制造缺陷，工艺变化和芯片布线效应而导致的NoC链路故障的可能性。已经提出了一些策略，通过用备用的电线替换它们或部分使用有缺陷的链路来容忍故障的电线。然而，这些策略要么受到高面积和功率开销的影响，要么显著增加平均网络延迟。本文提出了一种新的flit串行化方法，该方法将链路和flit分成若干段，并将相邻flit的flit段串行化，在所有可用的无故障链路段上传输，避免了有缺陷链路带宽的完全浪费。实验结果表明，与相关的部分故障链路使用方案相比，我们的方法显著降低了延迟开销，实现了优雅的性能下降，与备用线路替换方法相比，我们的方法分别节省了29%和43.1%的面积和功耗开销。

引用次数: 21

Hierarchical Network-on-Chip and Traffic Compression for Spiking Neural Network Implementations 尖峰神经网络实现的分层片上网络和流量压缩

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.17

Snaider Carrillo, J. Harkin, L. McDaid, S. Pande, Seamus Cawley, Brian McGinley, F. Morgan

The complexity of inter-neuron connectivity is prohibiting scalable hardware implementations of spiking neural networks (SNNs). Traditional neuron interconnect using a shared bus topology is not scalable due to non-linear growth of neuron connections with the neural network size. This paper presents a novel hierarchical NoC (H-NoC) architecture for SNN hardware which addresses the scalability issue by creating a 3-dimensional array of clusters of neurons with a hierarchical structure of low and high-level routers. The H-NoC architecture also incorporates a spike traffic compression technique to exploit SNN traffic patterns, thus reducing traffic overhead and improving throughput on the network. In addition, adaptive routing capabilities between clusters balance local and global traffic loads to sustain throughput under bursting activity. Simulation results show a high throughput per cluster (3.33×109 spikes/second), and synthesis results using 65-nm CMOS technology demonstrate low cost area (0.587mm2) and power consumption (13.16mW @100MHz) for a single cluster of 400 neurons, which outperforms existing SNN hardware strategies.

神经元间连接的复杂性阻碍了尖峰神经网络(snn)的可扩展硬件实现。由于神经元连接随神经网络规模的非线性增长，使用共享总线拓扑的传统神经元互连不具有可扩展性。本文提出了一种新颖的分层NoC (H-NoC) SNN硬件架构，该架构通过创建具有低级和高级路由器分层结构的神经元簇的三维阵列来解决可扩展性问题。H-NoC架构还结合了峰值流量压缩技术来利用SNN流量模式，从而减少流量开销并提高网络上的吞吐量。此外，集群之间的自适应路由功能平衡本地和全局流量负载，以在突发活动下维持吞吐量。仿真结果表明，每个集群的吞吐量高(3.33×109 spikes/second)，而使用65纳米CMOS技术的合成结果表明，单个400个神经元集群的成本面积(0.587mm2)和功耗(13.16mW @100MHz)较低，优于现有的SNN硬件策略。

{"title":"Hierarchical Network-on-Chip and Traffic Compression for Spiking Neural Network Implementations","authors":"Snaider Carrillo, J. Harkin, L. McDaid, S. Pande, Seamus Cawley, Brian McGinley, F. Morgan","doi":"10.1109/NOCS.2012.17","DOIUrl":"https://doi.org/10.1109/NOCS.2012.17","url":null,"abstract":"The complexity of inter-neuron connectivity is prohibiting scalable hardware implementations of spiking neural networks (SNNs). Traditional neuron interconnect using a shared bus topology is not scalable due to non-linear growth of neuron connections with the neural network size. This paper presents a novel hierarchical NoC (H-NoC) architecture for SNN hardware which addresses the scalability issue by creating a 3-dimensional array of clusters of neurons with a hierarchical structure of low and high-level routers. The H-NoC architecture also incorporates a spike traffic compression technique to exploit SNN traffic patterns, thus reducing traffic overhead and improving throughput on the network. In addition, adaptive routing capabilities between clusters balance local and global traffic loads to sustain throughput under bursting activity. Simulation results show a high throughput per cluster (3.33×109 spikes/second), and synthesis results using 65-nm CMOS technology demonstrate low cost area (0.587mm2) and power consumption (13.16mW @100MHz) for a single cluster of 400 neurons, which outperforms existing SNN hardware strategies.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"136 1","pages":"83-90"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76385798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Analytical Performance Modeling of Hierarchical Interconnect Fabrics 分层互连结构的分析性能建模

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.20

N. Nikitin, Javier de San Pedro, J. Carmona, J. Cortadella

The continuous scaling of nanoelectronics is increasing the complexity of chip multiprocessors (CMPs) and exacerbating the memory wall problem. As CMPs become more complex, the memory subsystem is organized into more hierarchical structures to better exploit locality. During the exploration and design of CMP architectures, it is essential to efficiently analyze their performance. However, performance is highly determined by the latency of the memory subsystem, which in turn has a cyclic dependency with the memory traffic generated by the cores. This paper proposes a scalable analytical method to estimate the performance of highly parallel CMPs (hundreds of cores) with hierarchical interconnect fabrics. The method can use customizable probabilistic models and solves the cyclic dependencies by using a fixed-point strategy. The technique is shown to be a very accurate and efficient strategy when compared to the results obtained by simulation.

纳米电子学的不断扩展增加了芯片多处理器(cmp)的复杂性，加剧了存储壁问题。随着cmp变得越来越复杂，内存子系统被组织成更多的层次结构，以更好地利用局部性。在CMP体系结构的探索和设计过程中，有效地分析其性能是至关重要的。然而，性能在很大程度上取决于内存子系统的延迟，而内存子系统又与内核生成的内存流量具有循环依赖关系。本文提出了一种可扩展的分析方法来估计具有分层互连结构的高度并行cmp(数百核)的性能。该方法采用可定制的概率模型，采用不动点策略求解循环依赖关系。仿真结果表明，该方法是一种非常精确和有效的策略。

引用次数: 13

Generic Monitoring and Management Infrastructure for 3D NoC-Bus Hybrid Architectures 3D NoC-Bus混合架构的通用监控和管理基础设施

2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

Pub Date : 2012-05-09 DOI: 10.1109/NOCS.2012.28

A. Rahmani, Kameswar Rao Vaddina, Khalid Latif, P. Liljeberg, J. Plosila, H. Tenhunen

Three-dimensional integrated circuits (3D ICs) achieve enhanced system integration and improved performance at lower cost and reduced area footprint. In order to exploit the intrinsic capability of reducing the wire length in 3D ICs, 3D NoC-Bus Hybrid mesh architecture was proposed which provides performance, power consumption, and area benefits. Besides its various advantages, this architecture has a unique and hitherto previously unexplored way to implement an efficient system-wide monitoring network. In this paper, an integrated low-cost monitoring platform for 3D stacked mesh architectures is proposed which can be efficiently used for various system management purposes such as traffic monitoring, thermal management and fault tolerance. The proposed generic monitoring and management infrastructure called ARB-NET utilizes bus arbiters to exchange the monitoring information directly with each other without using the data network. As a test case, based on the proposed monitoring and management platform, a fully congestion-aware and inter-layer fault tolerant routing algorithm named AdaptiveXYZ is presented taking advantage of viable information generated using bus arbiter network. In addition, we propose a thermal monitoring and management strategy on top of our ARB-NET infrastructure. Compared to recently proposed stacked mesh 3D NoCs, our extensive simulations with synthetic and real benchmarks reveal that our architecture using the AdaptiveXYZ routing can help in achieving significant power and performance improvements while preserving the system reliability with negligible hardware overhead.

三维集成电路(3D ic)以更低的成本和更小的占地面积实现了增强的系统集成和改进的性能。为了利用3D集成电路固有的缩短线长的能力，提出了具有性能、功耗和面积优势的3D NoC-Bus混合网格结构。除了各种优点之外，该体系结构还具有一种独特的、迄今为止尚未开发的方式来实现高效的系统范围的监控网络。本文提出了一种低成本的三维堆叠网格结构集成监控平台，可以有效地用于交通监控、热管理和容错等多种系统管理目的。提出的通用监视和管理基础设施称为ARB-NET，利用总线仲裁器在不使用数据网络的情况下直接相互交换监视信息。作为测试用例，基于所提出的监控管理平台，利用总线仲裁器网络产生的可行信息，提出了一种完全感知拥塞、层间容错的路由算法AdaptiveXYZ。此外，我们在ARB-NET基础设施之上提出了热监测和管理策略。与最近提出的堆叠网格3D noc相比，我们对合成基准和真实基准的广泛模拟表明，我们使用AdaptiveXYZ路由的架构可以帮助实现显着的功率和性能改进，同时保持系统可靠性，硬件开销可以忽略不计。

{"title":"Generic Monitoring and Management Infrastructure for 3D NoC-Bus Hybrid Architectures","authors":"A. Rahmani, Kameswar Rao Vaddina, Khalid Latif, P. Liljeberg, J. Plosila, H. Tenhunen","doi":"10.1109/NOCS.2012.28","DOIUrl":"https://doi.org/10.1109/NOCS.2012.28","url":null,"abstract":"Three-dimensional integrated circuits (3D ICs) achieve enhanced system integration and improved performance at lower cost and reduced area footprint. In order to exploit the intrinsic capability of reducing the wire length in 3D ICs, 3D NoC-Bus Hybrid mesh architecture was proposed which provides performance, power consumption, and area benefits. Besides its various advantages, this architecture has a unique and hitherto previously unexplored way to implement an efficient system-wide monitoring network. In this paper, an integrated low-cost monitoring platform for 3D stacked mesh architectures is proposed which can be efficiently used for various system management purposes such as traffic monitoring, thermal management and fault tolerance. The proposed generic monitoring and management infrastructure called ARB-NET utilizes bus arbiters to exchange the monitoring information directly with each other without using the data network. As a test case, based on the proposed monitoring and management platform, a fully congestion-aware and inter-layer fault tolerant routing algorithm named AdaptiveXYZ is presented taking advantage of viable information generated using bus arbiter network. In addition, we propose a thermal monitoring and management strategy on top of our ARB-NET infrastructure. Compared to recently proposed stacked mesh 3D NoCs, our extensive simulations with synthetic and real benchmarks reveal that our architecture using the AdaptiveXYZ routing can help in achieving significant power and performance improvements while preserving the system reliability with negligible hardware overhead.","PeriodicalId":6333,"journal":{"name":"2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip","volume":"108 4 1","pages":"177-184"},"PeriodicalIF":0.0,"publicationDate":"2012-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84815980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17