In system-level interconnects for high-performance computing (HPC), low-diameter hierarchical topologies like dragonfly are gaining in popularity. The topologies require adaptive routing schemes for high performance, but using non-minimal paths can stress the long-distance inter-group links that are the most expensive and scarce network resource.Âă We introduce "FlexflyâĂIJ, a network design incorporating optical switches that steers bandwidth onto minimal paths instead of diverting packets, alleviating contention. Performance results and the simulation methodology using the Structural Simulation Toolkit (SST) are introduced.
{"title":"Bringing minimal routing back to HPC through silicon photonics: a study of \"flexfly\" architectures with the structural simulation toolkit (SST)","authors":"Jeremiah J. Wilke","doi":"10.1145/3073763.3073775","DOIUrl":"https://doi.org/10.1145/3073763.3073775","url":null,"abstract":"In system-level interconnects for high-performance computing (HPC), low-diameter hierarchical topologies like dragonfly are gaining in popularity. The topologies require adaptive routing schemes for high performance, but using non-minimal paths can stress the long-distance inter-group links that are the most expensive and scarce network resource.Âă We introduce \"FlexflyâĂIJ, a network design incorporating optical switches that steers bandwidth onto minimal paths instead of diverting packets, alleviating contention. Performance results and the simulation methodology using the Structural Simulation Toolkit (SST) are introduced.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90616162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As we usher into the billion-transistor era, NoC which was once deemed as the solution is defecting due to high power consumption in its components. Several techniques have been proposed over the years to improve the performance of the NoCs, trading off power efficiency. However, low power design solution is one of the essential requirements of future NoC-based SoC applications. Power dissipation can be reduced by efficient routers, architecture saving techniques and communication links. This paper presents recent contributions and efficient saving techniques at the router, NoC architecture and Communication link level.
{"title":"A survey of low power NoC design techniques","authors":"Emmanuel Ofori-Attah, Michael Opoku Agyeman","doi":"10.1145/3073763.3073767","DOIUrl":"https://doi.org/10.1145/3073763.3073767","url":null,"abstract":"As we usher into the billion-transistor era, NoC which was once deemed as the solution is defecting due to high power consumption in its components. Several techniques have been proposed over the years to improve the performance of the NoCs, trading off power efficiency. However, low power design solution is one of the essential requirements of future NoC-based SoC applications. Power dissipation can be reduced by efficient routers, architecture saving techniques and communication links. This paper presents recent contributions and efficient saving techniques at the router, NoC architecture and Communication link level.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86784152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Kalray MPPA-256 Bostan manycore processor implements a clustered architecture, where clusters of cores share a local memory, and a DMA-capable network-on-chip (NoC) connects the clusters. The NoC implements wormhole switching without virtual channels, with source routing, and can be configured for maximum flow rate and burstiness at ingress. We describe and illustrate the techniques used to configure the MPPA NoC for guaranteed services. Our approach is based on three steps: global selection of routes between end-points and computation of flow rates, by solving the max-min fairness with unsplittable path problem; configuration of the flow burstiness parameters at ingress, by solving an acyclic set of linear inequalities; and end-to-end latency upper bound computation, based on the principles of separated flow analysis (SFA). In this paper, we develop the two last steps, taking advantage of the effects of NoC link shaping on the leaky-bucket arrival curves of flows.
{"title":"Network-on-chip service guarantees on the kalray MPPA-256 bostan processor","authors":"B. Dinechin, Amaury Graillat","doi":"10.1145/3073763.3073770","DOIUrl":"https://doi.org/10.1145/3073763.3073770","url":null,"abstract":"The Kalray MPPA-256 Bostan manycore processor implements a clustered architecture, where clusters of cores share a local memory, and a DMA-capable network-on-chip (NoC) connects the clusters. The NoC implements wormhole switching without virtual channels, with source routing, and can be configured for maximum flow rate and burstiness at ingress. We describe and illustrate the techniques used to configure the MPPA NoC for guaranteed services. Our approach is based on three steps: global selection of routes between end-points and computation of flow rates, by solving the max-min fairness with unsplittable path problem; configuration of the flow burstiness parameters at ingress, by solving an acyclic set of linear inequalities; and end-to-end latency upper bound computation, based on the principles of separated flow analysis (SFA). In this paper, we develop the two last steps, taking advantage of the effects of NoC link shaping on the leaky-bucket arrival curves of flows.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89055892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
YongTing Hu, Daniel Mueller-Gritschneder, Ulf Schlichtmann
With increasing density on circuits, more cores are integrated. Networks-on-chip (NoCs) is emerged as a solution for interconnect. Many router architectures, NoC topologies and routing algorithms are developed to improve NoC design. This brings a large design space to explore. The exploration requires various models and tools to evaluate NoCs. So this paper proposes a model-based framework that can integrate different evaluation together. Each NoC design is processed as one model using Eclipse Modelling Framework (EMF). Models can be used in code generation to generate different evaluation models, including ORION, SystemC and LISNoC Verilog description. An execution is further developed to compile, execute and synthesize models. The framework is experimented with both a real multi-media application and random traffic tests. Various aspects of evaluation are reported, including latency, throughoutput, buffer utilization, area, power and so on.
{"title":"Model-based framework for networks-on-chip design space exploration","authors":"YongTing Hu, Daniel Mueller-Gritschneder, Ulf Schlichtmann","doi":"10.1145/3073763.3073769","DOIUrl":"https://doi.org/10.1145/3073763.3073769","url":null,"abstract":"With increasing density on circuits, more cores are integrated. Networks-on-chip (NoCs) is emerged as a solution for interconnect. Many router architectures, NoC topologies and routing algorithms are developed to improve NoC design. This brings a large design space to explore. The exploration requires various models and tools to evaluate NoCs. So this paper proposes a model-based framework that can integrate different evaluation together. Each NoC design is processed as one model using Eclipse Modelling Framework (EMF). Models can be used in code generation to generate different evaluation models, including ORION, SystemC and LISNoC Verilog description. An execution is further developed to compile, execute and synthesize models. The framework is experimented with both a real multi-media application and random traffic tests. Various aspects of evaluation are reported, including latency, throughoutput, buffer utilization, area, power and so on.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90263117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the number of functional IP blocks connected on a die increase, SoC development becomes constrained by the capabilities of the on-chip interconnect that connects these IP blocks together. And as the use of commercial IP increase to encompass 80% or more of a commercial SoCs functionality, innovation and differentiation between competing designs could only be expressed in how the IP is connected, as implemented by the on-chip interconnect. To keep up with the demands of the SoC, the interconnects have also become fairly complex and sophisticated. The desire for satisfying the needs of next generation SoCs, while optimizing the area, processing efficiency and power consumption, is driving innovation in switch designs, routing algorithms, transport mechanisms, Quality of Service and coherency schemes. The problem space is big and perhaps more complex in certain ways than that of data networks. The changing application requirements is also changing how we look at Service Level Agreements (SLAs) within the SoC. The SLAs for next generation Interconnects have to go beyond delay and bandwidth considerations to also include resiliency, fault tolerance, and security. In this talk, I will discuss the challenges in building next generation Interconnects, the innovation taking place to address these challenges and how the SoC interconnects are different from the interconnects in data networks.
{"title":"Interconnects for next generation SoC designs","authors":"S. Shah","doi":"10.1145/3073763.3073771","DOIUrl":"https://doi.org/10.1145/3073763.3073771","url":null,"abstract":"As the number of functional IP blocks connected on a die increase, SoC development becomes constrained by the capabilities of the on-chip interconnect that connects these IP blocks together. And as the use of commercial IP increase to encompass 80% or more of a commercial SoCs functionality, innovation and differentiation between competing designs could only be expressed in how the IP is connected, as implemented by the on-chip interconnect. To keep up with the demands of the SoC, the interconnects have also become fairly complex and sophisticated. The desire for satisfying the needs of next generation SoCs, while optimizing the area, processing efficiency and power consumption, is driving innovation in switch designs, routing algorithms, transport mechanisms, Quality of Service and coherency schemes. The problem space is big and perhaps more complex in certain ways than that of data networks. The changing application requirements is also changing how we look at Service Level Agreements (SLAs) within the SoC. The SLAs for next generation Interconnects have to go beyond delay and bandwidth considerations to also include resiliency, fault tolerance, and security. In this talk, I will discuss the challenges in building next generation Interconnects, the innovation taking place to address these challenges and how the SoC interconnects are different from the interconnects in data networks.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"30 5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77511850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Pleros, N. Terzenidis, T. Alexoudi, K. Vyrsokinos, G. Kanellos, D. Syrivelis
The vast amount of new data being generated is outpacing the development of infrastructures and continues to grow at much higher rates than MooreâĂŹs law, a problem that is commonly referred to as the âĂIJdata deluge problemâĂİ. This brings current computational machines in the struggle to exceed Exascale processing powers by 2020 and this is where the energy boundary is setting the second, bottom-side alarm: A reasonable power envelope for future Super-computers has been projected to be 20MW, while worldâĂŹs current No. 1 Supercomputer Sunway TaihuLight provides 93 Pflops and requires already 15.37 MW. This simply means that we have reached so far below 10% of the Exascale target but we consume already more than 75% of the tar-geted energy limit! The way to escape is currently following the paradigm of disaggregating and disintegrating resources, massively introducing at the same time optical technologies for interconnect purposes. Disaggregating computing from memory and storage modules can allow for flexible and modular settings where hardware requirements can be tailored to meet the certain energy and performance metrics targeted per application. At the same time, optical interconnect and photonic integration technologies are rapidly replacing electrical interconnects continuously penetrating at deeper hierarchy levels: Silicon photonics have enabled the penetration of optical technology to the computing environment, starting from rack-to-rack and gradually shifting towards board-level communications. In this article, we present our recent work towards implementing on-board single-mode optical interconnects that can support Software Defined Networking allowing for programmable and flexible computational settings that can quickly adapt to the application requirements. We present a programmable 4×4 Silicon Photonic switch that supports SDN through the use of Bloom filter (BF) labeled router ports. Our scheme significantly simplifies packet forwarding as it negates the need for large forwarding tables, supporting at the same time network size and topol-ogy changes through simple modifications in the assigned BF labels. We demonstrate 1×4 switch operation controlling the Si-Pho switch by a Stratix V FPGA board that is responsible for processing the packet ID and correlating its destination with the appropriate BF-labeled switch output port. Moving towards high-capacity board-level settings, we discuss the architecture and technology being currently promoted by the recently started H2020 project ICT-STREAMS, where single-mode optical PCBs hosting Si-based routing modules and mid-board transceiver optics expect to enable a massive any-to-any, buffer-less, collision-less and extremely low latency routing platform with 25.6Tb/s aggregate through-put. This architecture and technology are also extended to support resource disaggregation in data centers as currently being pursued in the H2020 project dREDBox, where the any-to-any collisionless routing s
产生的大量新数据的速度超过了基础设施的发展速度,并继续以比MooreâĂŹs法律高得多的速度增长,这个问题通常被称为âĂIJdata洪水problemâĂİ。这使得目前的计算机器在2020年的处理能力超过百亿亿次,这是能量边界设置的第二个,底部的警报:未来超级计算机的合理功率包线预计为20MW,而worldâĂŹs目前排名第一的超级计算机神威太湖之光提供93 Pflops,已经需要15.37 MW。这仅仅意味着,到目前为止,我们只达到了百亿亿次目标的10%以下,但我们消耗的能量已经超过了目标能量限制的75% !目前,逃避的方法是遵循分解和分解资源的范式,同时大规模引入用于互联目的的光学技术。从内存和存储模块中分离计算可以允许灵活和模块化的设置,可以定制硬件要求,以满足每个应用程序的特定能源和性能指标。与此同时,光互连和光子集成技术正在迅速取代电气互连,并不断向更深层次渗透:硅光子学使光学技术渗透到计算环境,从机架到机架逐渐转向板级通信。在本文中,我们介绍了我们最近在实现机载单模光互连方面的工作,该互连可以支持软件定义网络,允许可编程和灵活的计算设置,可以快速适应应用需求。我们提出了一个可编程的4×4硅光子交换机,它通过使用布隆滤波器(BF)标记的路由器端口来支持SDN。我们的方案大大简化了数据包转发,因为它不需要大型转发表,同时通过简单修改分配的BF标签来支持网络规模和拓扑的变化。我们演示了1×4开关操作,通过Stratix V FPGA板控制Si-Pho开关,该板负责处理数据包ID并将其目的地与适当的bf标记交换机输出端口相关联。转向高容量板级设置,我们讨论了最近启动的H2020项目ICT-STREAMS目前正在推广的架构和技术,其中单模光pcb承载基于si的路由模块和中板收发器光学器件,有望实现具有25.6Tb/s总吞吐量的大规模任意对任意、无缓冲、无碰撞和极低延迟的路由平台。这种架构和技术也被扩展到支持数据中心的资源分解,正如H2020项目dREDBox目前所追求的那样,其中提出了任意对任意的无冲突路由方案,用于连接分解的计算和内存块,以尽量减少远程内存访问延迟和能耗。
{"title":"Software-defined board- and chip-level optical interconnects for multi-socket communication and disaggregated computing","authors":"N. Pleros, N. Terzenidis, T. Alexoudi, K. Vyrsokinos, G. Kanellos, D. Syrivelis","doi":"10.1145/3073763.3073776","DOIUrl":"https://doi.org/10.1145/3073763.3073776","url":null,"abstract":"The vast amount of new data being generated is outpacing the development of infrastructures and continues to grow at much higher rates than MooreâĂŹs law, a problem that is commonly referred to as the âĂIJdata deluge problemâĂİ. This brings current computational machines in the struggle to exceed Exascale processing powers by 2020 and this is where the energy boundary is setting the second, bottom-side alarm: A reasonable power envelope for future Super-computers has been projected to be 20MW, while worldâĂŹs current No. 1 Supercomputer Sunway TaihuLight provides 93 Pflops and requires already 15.37 MW. This simply means that we have reached so far below 10% of the Exascale target but we consume already more than 75% of the tar-geted energy limit! The way to escape is currently following the paradigm of disaggregating and disintegrating resources, massively introducing at the same time optical technologies for interconnect purposes. Disaggregating computing from memory and storage modules can allow for flexible and modular settings where hardware requirements can be tailored to meet the certain energy and performance metrics targeted per application. At the same time, optical interconnect and photonic integration technologies are rapidly replacing electrical interconnects continuously penetrating at deeper hierarchy levels: Silicon photonics have enabled the penetration of optical technology to the computing environment, starting from rack-to-rack and gradually shifting towards board-level communications. In this article, we present our recent work towards implementing on-board single-mode optical interconnects that can support Software Defined Networking allowing for programmable and flexible computational settings that can quickly adapt to the application requirements. We present a programmable 4×4 Silicon Photonic switch that supports SDN through the use of Bloom filter (BF) labeled router ports. Our scheme significantly simplifies packet forwarding as it negates the need for large forwarding tables, supporting at the same time network size and topol-ogy changes through simple modifications in the assigned BF labels. We demonstrate 1×4 switch operation controlling the Si-Pho switch by a Stratix V FPGA board that is responsible for processing the packet ID and correlating its destination with the appropriate BF-labeled switch output port. Moving towards high-capacity board-level settings, we discuss the architecture and technology being currently promoted by the recently started H2020 project ICT-STREAMS, where single-mode optical PCBs hosting Si-based routing modules and mid-board transceiver optics expect to enable a massive any-to-any, buffer-less, collision-less and extremely low latency routing platform with 25.6Tb/s aggregate through-put. This architecture and technology are also extended to support resource disaggregation in data centers as currently being pursued in the H2020 project dREDBox, where the any-to-any collisionless routing s","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83309548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A hybrid optimization scheme is presented in this paper that combines Tabu-search, Force Directed Swapping and Discrete Particle Swarm Optimization for Network-on-Chip (NoC) mapping problem. The main goal of the optimization is to map an application core graph such that the overall communication latency and energy consumption of the NoC are minimal. Discrete Particle Swarm Optimization is used as the main optimization scheme where each particle move is influenced by a force derived from the network traffic matrix. We also employ a Tabu-list to discourage swarm particles to re-visit the explored search space. This is done through particle reflection which proposes an alternative route towards the intended move direction. The methodology is tested for some multimedia application core graphs as well as randomly generated large network of synthetic cores. It was found that on average, this hybrid algorithm required less number of iterations to reach an optimal solution as compared to other existing and past algorithms without losing the quality of NoC mapping.
{"title":"Optimal application mapping to 2D-mesh NoCs by using a tabu-based particle swarm methodology","authors":"Muhammad Obaidullah, G. Khan","doi":"10.1145/3073763.3073766","DOIUrl":"https://doi.org/10.1145/3073763.3073766","url":null,"abstract":"A hybrid optimization scheme is presented in this paper that combines Tabu-search, Force Directed Swapping and Discrete Particle Swarm Optimization for Network-on-Chip (NoC) mapping problem. The main goal of the optimization is to map an application core graph such that the overall communication latency and energy consumption of the NoC are minimal. Discrete Particle Swarm Optimization is used as the main optimization scheme where each particle move is influenced by a force derived from the network traffic matrix. We also employ a Tabu-list to discourage swarm particles to re-visit the explored search space. This is done through particle reflection which proposes an alternative route towards the intended move direction. The methodology is tested for some multimedia application core graphs as well as randomly generated large network of synthetic cores. It was found that on average, this hybrid algorithm required less number of iterations to reach an optimal solution as compared to other existing and past algorithms without losing the quality of NoC mapping.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82165447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In some application domains (e.g., mission-critical systems), proactive detection of reliability threats or prompt fault containment are mandatory in order to avoid or limit the malfunctioning of electronic systems as an effect of the onset of permanent faults at runtime. As an essential milestone for the design of these systems, this paper presents a distributed and lightweight control framework for the built-in self-testing of networks-on-chip (NoCs) in the background while applications are running. The main idea of this concurrent online testing framework consists of modularizing the NoC into communication channels, of selectively taking such channels offline for non-concurrent testing, and of reconfiguring the NoC routing function to route packets around the temporary blockages to preserve network availability.
{"title":"Transparent lifetime built-in self-testing of networks-on-chip through the selective non-concurrent testing of their communication channels","authors":"Marco Balboni, D. Bertozzi","doi":"10.1145/3073763.3073765","DOIUrl":"https://doi.org/10.1145/3073763.3073765","url":null,"abstract":"In some application domains (e.g., mission-critical systems), proactive detection of reliability threats or prompt fault containment are mandatory in order to avoid or limit the malfunctioning of electronic systems as an effect of the onset of permanent faults at runtime. As an essential milestone for the design of these systems, this paper presents a distributed and lightweight control framework for the built-in self-testing of networks-on-chip (NoCs) in the background while applications are running. The main idea of this concurrent online testing framework consists of modularizing the NoC into communication channels, of selectively taking such channels offline for non-concurrent testing, and of reconfiguring the NoC routing function to route packets around the temporary blockages to preserve network availability.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"386 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77683618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless on-chip communication is an emerging technology that is currently being adopted in order to reduce latency and energy consumption of network transactions in many-core systems. The reason is that the multi-hop nature of conventional electrical network-on-chip has lead to the point of diminishing returns, which even aggravates as the number of hops increases to meet the ever-increasing core count in many-core systems. A Wireless NoC (WNoC) can be realized to broadcast network messages in a more efficient manner, so current research is exploring hybrid NoC designs composed of an electrical NoC and a WNoC to reach the desired performance improvement. Nonetheless, so far, nobody has addressed the problem of having network attacks when using a WNoC. In this work, we propose a security mechanism for a 64-core system with a hybrid NoC implementing ECONO cache coherence. Our experimental evaluation using multi-threaded applications from state-of-the-art benchmark suites reveals that the most lightweight technology designed to secure broadcast messages through hash-based functions can lead to more than 30% performance degradation. In addition, based on our study, we also propose tolerable latencies that must be achieved in future designs to guarantee truly lightweight secure WNoCs.
{"title":"Secure communications in wireless network-on-chips","authors":"F. Pereñíguez-Garcia, José L. Abellán","doi":"10.1145/3073763.3073768","DOIUrl":"https://doi.org/10.1145/3073763.3073768","url":null,"abstract":"Wireless on-chip communication is an emerging technology that is currently being adopted in order to reduce latency and energy consumption of network transactions in many-core systems. The reason is that the multi-hop nature of conventional electrical network-on-chip has lead to the point of diminishing returns, which even aggravates as the number of hops increases to meet the ever-increasing core count in many-core systems. A Wireless NoC (WNoC) can be realized to broadcast network messages in a more efficient manner, so current research is exploring hybrid NoC designs composed of an electrical NoC and a WNoC to reach the desired performance improvement. Nonetheless, so far, nobody has addressed the problem of having network attacks when using a WNoC. In this work, we propose a security mechanism for a 64-core system with a hybrid NoC implementing ECONO cache coherence. Our experimental evaluation using multi-threaded applications from state-of-the-art benchmark suites reveals that the most lightweight technology designed to secure broadcast messages through hash-based functions can lead to more than 30% performance degradation. In addition, based on our study, we also propose tolerable latencies that must be achieved in future designs to guarantee truly lightweight secure WNoCs.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79710805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BXI, Bull eXascale Interconnect, is the new interconnection network developed by Bull, now an Atos company for High Performance Computing. First an overview of the BXI network is presented. It is designed and optimized for HPC workloads at very large scale. The BXI network is based on the Portals 4 protocol and permits a complete offload of communication primitives in hardware, thus enabling independent progress of computation and communication. We then describe the two BXI ASIC components, the network interface and the switch, and the BXI software environment. The fabric management integrates features for monitoring, performance analysis, quick traffic re-routing and jobs isolation for performance and security. We finally explain how the Bull eXascale platform integrates BXI to build a large scale parallel system and we present some results obtained on the first BXI systems.
{"title":"BXI: designing a network for eXascale","authors":"Jean-Pierre Panziera","doi":"10.1145/3073763.3073774","DOIUrl":"https://doi.org/10.1145/3073763.3073774","url":null,"abstract":"BXI, Bull eXascale Interconnect, is the new interconnection network developed by Bull, now an Atos company for High Performance Computing. First an overview of the BXI network is presented. It is designed and optimized for HPC workloads at very large scale. The BXI network is based on the Portals 4 protocol and permits a complete offload of communication primitives in hardware, thus enabling independent progress of computation and communication. We then describe the two BXI ASIC components, the network interface and the switch, and the BXI software environment. The fabric management integrates features for monitoring, performance analysis, quick traffic re-routing and jobs isolation for performance and security. We finally explain how the Bull eXascale platform integrates BXI to build a large scale parallel system and we present some results obtained on the first BXI systems.","PeriodicalId":20560,"journal":{"name":"Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80080935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}