Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration

IF 4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Optical Communications and Networking Pub Date : 2024-03-22 DOI:10.1364/JOCN.516031
Liang Qin;Huaxi Gu;Xiaoshan Yu;Zheyi Cai;Junchen Liu
{"title":"Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration","authors":"Liang Qin;Huaxi Gu;Xiaoshan Yu;Zheyi Cai;Junchen Liu","doi":"10.1364/JOCN.516031","DOIUrl":null,"url":null,"abstract":"Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least \n<tex>${3} \\times$</tex>\n and enhances throughput by up to 60%.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10536144/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least ${3} \times$ and enhances throughput by up to 60%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
兰花通过不频繁的拓扑重组增强高性能计算互连网络
互连网络是高性能计算(HPC)系统的关键组成部分。随着高性能计算向超大规模时代发展,在传统网络中通过超额订购在计算节点对之间提供足够的分段带宽变得过于昂贵和不切实际。在过去十年中,一些利用光路交换机(OCS)进行动态链路带宽分配的架构得到了广泛应用。这些架构需要频繁地重新配置网络拓扑结构,以适应不断变化的流量需求。然而,OCS 技术固有的较长的重新配置延迟仍然阻碍了实际应用。我们提出的 Orchid 是一种利用 OCS 实现非频繁拓扑重新配置的架构,可有效解决重新配置延迟过长的问题。Orchid 的一个关键创新点是从历史数据中提取稳定流量矩阵的能力。这一功能可指导拓扑结构的重新配置,而无需对每个流量矩阵进行调整,从而可在更长的时间范围内分担 OCS 开销。此外,Orchid 还能通过 OCS 配置和路由的联合设计,解决意外流量可能造成的拥堵问题,确保流量在全球链路上的均匀分布。使用真实的高性能计算应用跟踪和合成流量进行的大量实验表明,与现有的高性能计算互连网络相比,Orchid 的性能有了显著提高。具体来说,Orchid至少将数据包延迟降低了3\倍,吞吐量提高达60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.40
自引率
16.00%
发文量
104
审稿时长
4 months
期刊介绍: The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.
期刊最新文献
Low-complexity end-to-end deep learning framework for 100G-PON Optical networking that exploits massive wavelength/spectrum and spatial parallelisms Zero-cost upgrade to a multi-fiber network with partial lane-change capabilities Benchmarking framework for resource allocation algorithms in multicore fiber elastic optical networks SkipNet: an adaptive neural network equalization algorithm for future passive optical networking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1