NxtSPR：基于三核的多核架构的中继专用无死锁最短路径路由算法

IF 2 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS Parallel Computing Pub Date : 2024-07-24 DOI:10.1016/j.parco.2024.103094

Chunfeng Li, Karim Soliman, Fei Yin, Jin Wei, Feng Shi

{"title":"NxtSPR：基于三核的多核架构的中继专用无死锁最短路径路由算法","authors":"Chunfeng Li, Karim Soliman, Fei Yin, Jin Wei, Feng Shi","doi":"10.1016/j.parco.2024.103094","DOIUrl":null,"url":null,"abstract":"<div><p>Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network’s latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"121 ","pages":"Article 103094"},"PeriodicalIF":2.0000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture\",\"authors\":\"Chunfeng Li, Karim Soliman, Fei Yin, Jin Wei, Feng Shi\",\"doi\":\"10.1016/j.parco.2024.103094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network’s latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.</p></div>\",\"PeriodicalId\":54642,\"journal\":{\"name\":\"Parallel Computing\",\"volume\":\"121 \",\"pages\":\"Article 103094\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167819124000322\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167819124000322","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

无死锁路由是片上网络（NoC）设计中的一个重大挑战，因为它会影响网络的延迟、功耗和负载平衡，从而影响多处理器片上系统的性能。然而，实现无死锁路由通常会产生昂贵的开销，因为以往的解决方案要么牺牲性能或能效来主动避免死锁，要么在死锁发生时强加高硬件复杂性来被动解决死锁。利用 NoC 的各种特性来实现无死锁路由，可以大大提高成本效益，同时减少对性能的影响。本文提出了一种具有最短路径特性的中继路由算法（NxtSPR），以及一种基于同步哈密顿环的死锁预防机制。该提案基于对基于三重多核架构（TriBA）NoC 特性的深入研究。我们建立了各种重要的拓扑相关理论，并对它们进行了形式验证（基于证明）。通过利用 TriBA 的临界子图和顶点，NxtSPR 可以使用简明的判断策略预先计算下游节点的数据包转发端口。与其他 TriBA 路由算法相比，这大大减少了数据传输所需的计算开销，同时优化了路由器的流水线设计，降低了数据包传输延迟和功耗。我们根据数据包在其传输生命周期内将穿越的最大汉密尔顿边的级别对数据传输进行分组。分组间独立的数据传输可以避免相互干扰和资源竞争，消除潜在的死锁。Gem5 仿真结果表明，在合成流量条件下，与代表性路由算法（表）和最新路由算法（SPR4T）相比，NxtSPR 在数据包平均延迟和每数据包功耗方面分别降低了 20.19% 和 14.76%，以及 5.54% 和 4.66%。此外，与它们相比，它的吞吐量平均提高了 18.50% 和 4.34%。PARSEC 基准测试结果表明，与 Table 和 SPR4T 相比，NxtSPR 最多缩短了 22.30% 和 12.82% 的应用运行时间；与 2D-Mesh 相比，使用 TriBA 运行相同的应用最多缩短了 10.77% 的运行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture

Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network’s latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Parallel Computing 工程技术-计算机：理论方法

CiteScore

3.50

自引率

7.10%

发文量

审稿时长

4.5 months

期刊介绍： Parallel Computing is an international journal presenting the practical use of parallel computer systems, including high performance architecture, system software, programming systems and tools, and applications. Within this context the journal covers all aspects of high-end parallel computing from single homogeneous or heterogenous computing nodes to large-scale multi-node systems. Parallel Computing features original research work and review articles as well as novel or illustrative accounts of application experience with (and techniques for) the use of parallel computers. We also welcome studies reproducing prior publications that either confirm or disprove prior published results. Particular technical areas of interest include, but are not limited to: -System software for parallel computer systems including programming languages (new languages as well as compilation techniques), operating systems (including middleware), and resource management (scheduling and load-balancing). -Enabling software including debuggers, performance tools, and system and numeric libraries. -General hardware (architecture) concepts, new technologies enabling the realization of such new concepts, and details of commercially available systems -Software engineering and productivity as it relates to parallel computing -Applications (including scientific computing, deep learning, machine learning) or tool case studies demonstrating novel ways to achieve parallelism -Performance measurement results on state-of-the-art systems -Approaches to effectively utilize large-scale parallel computing including new algorithms or algorithm analysis with demonstrated relevance to real applications using existing or next generation parallel computer architectures. -Parallel I/O systems both hardware and software -Networking technology for support of high-speed computing demonstrating the impact of high-speed computation on parallel applications

期刊最新文献

Towards resilient and energy efficient scalable Krylov solvers Seesaw: A 4096-bit vector processor for accelerating Kyber based on RISC-V ISA extensions Editorial Board FastPTM: Fast weights loading of pre-trained models for parallel inference service provisioning Distributed consensus-based estimation of the leading eigenvalue of a non-negative irreducible matrix