{"title":"Optical switching for data centers and advanced computing systems [Invited]","authors":"Giannis Patronas;Nikos Terzenidis;Prethvi Kashinkunti;Eitan Zahavi;Dimitris Syrivelis;Louis Capps;Zsolt-Alon Wertheimer;Nikos Argyris;Athanasios Fevgas;Craig Thompson;Avraham Ganor;Julie Bernauer;Elad Mentovich;Paraskevas Bakopoulos","doi":"10.1364/JOCN.534317","DOIUrl":null,"url":null,"abstract":"We explore optical switching to extend network programmability to the physical layer and discuss applications of a Layer-1 software-defined network (SDN) in AI/HPC clusters. In this context we identify two applications for optical circuit switches (OCSs): failure resilience and reconfigurable topologies for deep learning workloads. We present experimental results from a DGX-based testbed towards improving failure resilience and a simulation analysis for efficient deep learning training in AI clusters.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 1","pages":"A87-A95"},"PeriodicalIF":4.0000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10788427/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
We explore optical switching to extend network programmability to the physical layer and discuss applications of a Layer-1 software-defined network (SDN) in AI/HPC clusters. In this context we identify two applications for optical circuit switches (OCSs): failure resilience and reconfigurable topologies for deep learning workloads. We present experimental results from a DGX-based testbed towards improving failure resilience and a simulation analysis for efficient deep learning training in AI clusters.
期刊介绍:
The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.