As one of the last-mile access network solutions, free-space optical (FSO) communication can satisfy the rapidly growing traffic demand of 6G fronthaul networks. However, outdoor FSO transmission has to confront the influence of atmospheric conditions; fog and turbulence are worth more attention. To resist the impact of fog and turbulence on the distribution of optical signal amplitudes, we propose an environment-aware geometric shaping (GS) of signal amplitudes scheme for FSO fronthaul networks with four-level pulse amplitude modulation (PAM-4). The FSO networks are aware of channel states caused by fog and turbulence through visibility and temperature sensors to avoid the need for feedback links. Based on the environment-aware channel state information, the proposed GS algorithm determines adaptively the optimal electrical signal amplitudes of PAM-4, aiming to minimize the average bit error rate (BER) under the varying channel conditions. The effects of visibility and turbulence on PAM-4 signal amplitudes are theoretically modeled and experimentally evaluated using an environmental simulation chamber. For the first time, to the best of our knowledge, we demonstrate experimentally the effectiveness of the environment-aware GS in combating the effects of visibility and turbulence on FSO transmission performance. Experimental results show that the GS algorithm can reduce the average BER by 1/3 compared to the traditional PAM-4 using uniform amplitude distribution.
{"title":"Environment-aware geometric shaping for digital FSO fronthaul networks","authors":"Qiming Sun;Yejun Liu;Song Song;Yue Zhu;Xinkai Ni;Zhiwei Jiao;Lei Guo","doi":"10.1364/JOCN.562110","DOIUrl":"https://doi.org/10.1364/JOCN.562110","url":null,"abstract":"As one of the last-mile access network solutions, free-space optical (FSO) communication can satisfy the rapidly growing traffic demand of 6G fronthaul networks. However, outdoor FSO transmission has to confront the influence of atmospheric conditions; fog and turbulence are worth more attention. To resist the impact of fog and turbulence on the distribution of optical signal amplitudes, we propose an environment-aware geometric shaping (GS) of signal amplitudes scheme for FSO fronthaul networks with four-level pulse amplitude modulation (PAM-4). The FSO networks are aware of channel states caused by fog and turbulence through visibility and temperature sensors to avoid the need for feedback links. Based on the environment-aware channel state information, the proposed GS algorithm determines adaptively the optimal electrical signal amplitudes of PAM-4, aiming to minimize the average bit error rate (BER) under the varying channel conditions. The effects of visibility and turbulence on PAM-4 signal amplitudes are theoretically modeled and experimentally evaluated using an environmental simulation chamber. For the first time, to the best of our knowledge, we demonstrate experimentally the effectiveness of the environment-aware GS in combating the effects of visibility and turbulence on FSO transmission performance. Experimental results show that the GS algorithm can reduce the average BER by 1/3 compared to the traditional PAM-4 using uniform amplitude distribution.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 11","pages":"E37-E49"},"PeriodicalIF":4.3,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantum key distribution networks (QKDNs) enable secure communication even in the age of powerful quantum computers. In the hands of a network operator, which can offer its service to many users, the economic viability of a QKDN increases significantly. The highly challenging operator–user relationship in a large-scale network setting demands additional requirements to ensure carrier-grade operation. Addressing this challenge, this work presents a carrier-grade QKDN architecture, which combines the functional QKDN architecture with the operational perspective of a network operator, ultimately enhancing the economic viability of QKDNs. The focus is on the network and key management aspects of a QKDN while assuming state-of-the-art commercial QKD modules. The presented architecture was rolled out within an in-field demonstrator, connecting the cities of Berlin and Bonn over a link distance of 923 km across Germany. We could show that the proposed network architecture is feasible, integrable, and scalable, making it suitable for deployment in real-world networks. Overall, the presented carrier-grade QKDN architecture promises to serve as a blueprint for network operators providing QKD-based services to their customers.
{"title":"DemoQuanDT: a carrier-grade QKD network","authors":"P. Horoschenkoff;J. Henrich;R. Bohn;I. Khan;J. Rodiger;M. Gunkel;M. Bauch;J. Benda;P. Blacker;E. Eichhammer;U. Eismann;G. Frenck;H. Griesser;W. Jontofsohn;N. Kopshoff;S. Rohrich;F. Seidl;N. Schark;E. Sollner;D. von Blanckenburg;A. Heinemann;M. Stiemerling;M. Gartner","doi":"10.1364/JOCN.563470","DOIUrl":"https://doi.org/10.1364/JOCN.563470","url":null,"abstract":"Quantum key distribution networks (QKDNs) enable secure communication even in the age of powerful quantum computers. In the hands of a network operator, which can offer its service to many users, the economic viability of a QKDN increases significantly. The highly challenging operator–user relationship in a large-scale network setting demands additional requirements to ensure carrier-grade operation. Addressing this challenge, this work presents a carrier-grade QKDN architecture, which combines the functional QKDN architecture with the operational perspective of a network operator, ultimately enhancing the economic viability of QKDNs. The focus is on the network and key management aspects of a QKDN while assuming state-of-the-art commercial QKD modules. The presented architecture was rolled out within an in-field demonstrator, connecting the cities of Berlin and Bonn over a link distance of 923 km across Germany. We could show that the proposed network architecture is feasible, integrable, and scalable, making it suitable for deployment in real-world networks. Overall, the presented carrier-grade QKDN architecture promises to serve as a blueprint for network operators providing QKD-based services to their customers.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"743-756"},"PeriodicalIF":4.3,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144782076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agastya Raj;Zehao Wang;Tingjun Chen;Daniel C. Kilper;Marco Ruffini
Accurate modeling of the gain spectrum in erbium-doped fiber amplifiers (EDFAs) is essential for optimizing optical network performance, particularly as networks evolve toward multi-vendor solutions. In this work, we propose a generalized few-shot transfer learning architecture based on a semi-supervised self-normalizing neural network (SS-NN) that leverages internal EDFA features—such as VOA input/output power and attenuation—to improve gain spectrum prediction. Our SS-NN model employs a two-phase training strategy comprising unsupervised pre-training with noise-augmented measurements and supervised fine-tuning with a custom-weighted MSE loss. Furthermore, we extend the framework with transfer learning (TL) techniques that enable both homogeneous (same-feature space) and heterogeneous (different-feature sets) model adaptation across booster, pre-amplifier, and ILA EDFAs. To address feature mismatches in heterogeneous TL, we incorporate a covariance matching loss to align second-order feature statistics between the source and target domains. Extensive experiments conducted across 26 EDFAs in the COSMOS and Open Ireland testbeds demonstrate that the proposed approach significantly reduces the number of measurement requirements on the system while achieving lower mean absolute errors and improved error distributions compared to benchmark methods.
{"title":"Generalized few-shot transfer learning architecture for modeling the EDFA gain spectrum","authors":"Agastya Raj;Zehao Wang;Tingjun Chen;Daniel C. Kilper;Marco Ruffini","doi":"10.1364/JOCN.560987","DOIUrl":"https://doi.org/10.1364/JOCN.560987","url":null,"abstract":"Accurate modeling of the gain spectrum in erbium-doped fiber amplifiers (EDFAs) is essential for optimizing optical network performance, particularly as networks evolve toward multi-vendor solutions. In this work, we propose a generalized few-shot transfer learning architecture based on a semi-supervised self-normalizing neural network (SS-NN) that leverages internal EDFA features—such as VOA input/output power and attenuation—to improve gain spectrum prediction. Our SS-NN model employs a two-phase training strategy comprising unsupervised pre-training with noise-augmented measurements and supervised fine-tuning with a custom-weighted MSE loss. Furthermore, we extend the framework with transfer learning (TL) techniques that enable both homogeneous (same-feature space) and heterogeneous (different-feature sets) model adaptation across booster, pre-amplifier, and ILA EDFAs. To address feature mismatches in heterogeneous TL, we incorporate a covariance matching loss to align second-order feature statistics between the source and target domains. Extensive experiments conducted across 26 EDFAs in the COSMOS and Open Ireland testbeds demonstrate that the proposed approach significantly reduces the number of measurement requirements on the system while achieving lower mean absolute errors and improved error distributions compared to benchmark methods.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D106-D117"},"PeriodicalIF":4.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144716182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Alvarez Roa;Catalina Stan;Sebastian Verschoor;Idelfonso Tafur Monroy;Simon Rommel
Quantum key distribution (QKD) allows the distribution of secret keys for quantum-secure communication between two distant parties, vital in the quantum computing era in order to protect against quantum-enabled attackers. However, overcoming rate-distance limits in QKD and the establishment of quantum key distribution networks necessitate key relaying over trusted nodes. This process may be resource-intensive, consuming a substantial share of the scarce QKD key material to establish end-to-end secret keys. Hence, an efficient scheme for key relaying and the establishment of end-to-end key pools is essential for practical and extended quantum-secured networking. In this paper, we propose and compare two protocols for managing, storing, and distributing secret key material in QKD networks, addressing challenges such as the success rate of key requests, key consumption, and overhead resulting from relaying. We present an innovative, fully decentralized key distribution strategy as an alternative to the traditional hop-by-hop relaying via trusted nodes, where three experiments are considered to evaluate performance metrics under varying key demand. Our results show that the decentralized pre-flooding approach achieves higher success rates as application demands increase. This analysis highlights the strengths of each approach in enhancing QKD network performance, offering valuable insights for developing robust key distribution strategies in different scenarios.
{"title":"Decentralized key distribution versus on-demand relaying for QKD networks","authors":"Maria Alvarez Roa;Catalina Stan;Sebastian Verschoor;Idelfonso Tafur Monroy;Simon Rommel","doi":"10.1364/JOCN.547793","DOIUrl":"https://doi.org/10.1364/JOCN.547793","url":null,"abstract":"Quantum key distribution (QKD) allows the distribution of secret keys for quantum-secure communication between two distant parties, vital in the quantum computing era in order to protect against quantum-enabled attackers. However, overcoming rate-distance limits in QKD and the establishment of quantum key distribution networks necessitate key relaying over trusted nodes. This process may be resource-intensive, consuming a substantial share of the scarce QKD key material to establish end-to-end secret keys. Hence, an efficient scheme for key relaying and the establishment of end-to-end key pools is essential for practical and extended quantum-secured networking. In this paper, we propose and compare two protocols for managing, storing, and distributing secret key material in QKD networks, addressing challenges such as the success rate of key requests, key consumption, and overhead resulting from relaying. We present an innovative, fully decentralized key distribution strategy as an alternative to the traditional hop-by-hop relaying via trusted nodes, where three experiments are considered to evaluate performance metrics under varying key demand. Our results show that the decentralized pre-flooding approach achieves higher success rates as application demands increase. This analysis highlights the strengths of each approach in enhancing QKD network performance, offering valuable insights for developing robust key distribution strategies in different scenarios.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 8","pages":"732-742"},"PeriodicalIF":4.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Che-Yu Liu;Xiaoliang Chen;Roberto Proietti;Zuqing Zhu;S. J. Ben Yoo
Orchestrating job scheduling and topology reconfiguration in optical data center networks (ODCNs) is essential for meeting the intensive communication demand of novel applications, such as distributed machine learning (ML) workloads. However, this task involves joint optimization of multi-dimensional resources that can barely be effectively addressed by simple rule-based policies. In this paper, we leverage the powerful state representation and self-learning capabilities from deep reinforcement learning (DRL) and propose a multi-step job schedule algorithm for ODCNs. Our design decomposes a job request into an ordered sequence of virtual machines (VMs) and the related bandwidth demand in between, and then makes a DRL agent learn how to place the VMs sequentially. To do so, we feed the agent with the global bandwidth and IT resource utilization state embedded with the previous VM allocation decisions in each step and reward the agent with both team and individual incentives. The team reward encourages the agent to jointly optimize the VM placement in multiple steps to pursue successful provisioning of the job request, while the individual reward favors advantageous local placement decisions, i.e., to prevent effective policies being overwhelmed by a few subpar decisions. We also introduce a penalty on reconfiguration to balance between performance gains and reconfiguration overheads. Simulation results under various ODCN configurations and job loads show our proposal outperforms the existing heuristic solutions and reduces the job-blocking probability and reconfiguration frequency by at least $7.35 times$ and $4.59 times$, respectively.
{"title":"Deep reinforcement learning-aided multi-step job scheduling in optical data center networks","authors":"Che-Yu Liu;Xiaoliang Chen;Roberto Proietti;Zuqing Zhu;S. J. Ben Yoo","doi":"10.1364/JOCN.562531","DOIUrl":"https://doi.org/10.1364/JOCN.562531","url":null,"abstract":"Orchestrating job scheduling and topology reconfiguration in optical data center networks (ODCNs) is essential for meeting the intensive communication demand of novel applications, such as distributed machine learning (ML) workloads. However, this task involves joint optimization of multi-dimensional resources that can barely be effectively addressed by simple rule-based policies. In this paper, we leverage the powerful state representation and self-learning capabilities from deep reinforcement learning (DRL) and propose a multi-step job schedule algorithm for ODCNs. Our design decomposes a job request into an ordered sequence of virtual machines (VMs) and the related bandwidth demand in between, and then makes a DRL agent learn how to place the VMs sequentially. To do so, we feed the agent with the global bandwidth and IT resource utilization state embedded with the previous VM allocation decisions in each step and reward the agent with both team and individual incentives. The team reward encourages the agent to jointly optimize the VM placement in multiple steps to pursue successful provisioning of the job request, while the individual reward favors advantageous local placement decisions, i.e., to prevent effective policies being overwhelmed by a few subpar decisions. We also introduce a penalty on reconfiguration to balance between performance gains and reconfiguration overheads. Simulation results under various ODCN configurations and job loads show our proposal outperforms the existing heuristic solutions and reduces the job-blocking probability and reconfiguration frequency by at least <tex>$7.35 times$</tex> and <tex>$4.59 times$</tex>, respectively.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D96-D105"},"PeriodicalIF":4.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan De Neve;Ziyue Zhang;Wouter Tavernier;Didier Colle;Mario Pickavet
Multi-chip graphics processing units (GPUs) interconnected by a photonic network-on-wafer are a promising technology to further increase the performance of GPUs. The network control algorithm managing dynamic bandwidth allocation (DBA) in this network needs to execute very frequently so that resources can be optimally used. This algorithm relies on edge coloring bipartite multigraphs to translate inter-chip bandwidth demands into updated routing tables for the GPU chips and optical switches in the network. In this work, we design fast edge coloring algorithms, both approximate and exact, for bipartite multigraphs. These algorithms are tailored to the high edge multiplicities of the multigraphs in this research. The runtimes are optimized by using efficient data structures and introducing pre- and post-processing. These new algorithms are up to ${20} times$ faster than the state-of-the-art baseline algorithm. New simulations show that, with such low reconfiguration periods, DBA has the potential to double the performance of a high-traffic GPU workload compared to a static network with the same bandwidth.
{"title":"Edge coloring bipartite multigraphs for dynamically configuring optical switches","authors":"Jan De Neve;Ziyue Zhang;Wouter Tavernier;Didier Colle;Mario Pickavet","doi":"10.1364/JOCN.559454","DOIUrl":"https://doi.org/10.1364/JOCN.559454","url":null,"abstract":"Multi-chip graphics processing units (GPUs) interconnected by a photonic network-on-wafer are a promising technology to further increase the performance of GPUs. The network control algorithm managing dynamic bandwidth allocation (DBA) in this network needs to execute very frequently so that resources can be optimally used. This algorithm relies on edge coloring bipartite multigraphs to translate inter-chip bandwidth demands into updated routing tables for the GPU chips and optical switches in the network. In this work, we design fast edge coloring algorithms, both approximate and exact, for bipartite multigraphs. These algorithms are tailored to the high edge multiplicities of the multigraphs in this research. The runtimes are optimized by using efficient data structures and introducing pre- and post-processing. These new algorithms are up to <tex>${20} times$</tex> faster than the state-of-the-art baseline algorithm. New simulations show that, with such low reconfiguration periods, DBA has the potential to double the performance of a high-traffic GPU workload compared to a static network with the same bandwidth.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 8","pages":"720-731"},"PeriodicalIF":4.0,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human-to-machine applications, such as robotic teleoperation, require ultra-low latency for real-time interactions. In passive optical networks (PONs), edge AI servers at the optical line terminal can predict haptic feedback in advance based on control signals, thereby enhancing the immersive experience. To further reduce latency while preserving predictive performance, this paper proposes an eXplainable AI-assisted low-latency haptic feedback prediction framework, using XAI for feature selection to reduce inference time. In a 50G-PON network, the framework achieves the lowest round-trip delay and packet delay variation among evaluated approaches. Extensive simulations show a 64.9% reduction in inference time, 15.5% in round-trip delay, and 15.1% in delay variation under a typical traffic load of 0.5, demonstrating its effectiveness for next-generation AI-assisted optical networks.
{"title":"Explainable AI-assisted low-latency haptic feedback prediction for human-to-machine applications over passive optical networks","authors":"Yuxiao Wang;Sourav Mondal;Ye Pu;Elaine Wong","doi":"10.1364/JOCN.560757","DOIUrl":"https://doi.org/10.1364/JOCN.560757","url":null,"abstract":"Human-to-machine applications, such as robotic teleoperation, require ultra-low latency for real-time interactions. In passive optical networks (PONs), edge AI servers at the optical line terminal can predict haptic feedback in advance based on control signals, thereby enhancing the immersive experience. To further reduce latency while preserving predictive performance, this paper proposes an eXplainable AI-assisted low-latency haptic feedback prediction framework, using XAI for feature selection to reduce inference time. In a 50G-PON network, the framework achieves the lowest round-trip delay and packet delay variation among evaluated approaches. Extensive simulations show a 64.9% reduction in inference time, 15.5% in round-trip delay, and 15.1% in delay variation under a typical traffic load of 0.5, demonstrating its effectiveness for next-generation AI-assisted optical networks.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D83-D95"},"PeriodicalIF":4.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the increasingly computing-intensive and data-intensive workloads of high-performance computing applications, the need for more cores and larger storage capacity is expanding. While computational power is rapidly increasing, data movement capability among cores and memory modules has not stepped forward substantially. Low energy efficiency and parallelism of data movement have become a bottleneck. Optical interconnects with better bandwidth and power performance are a promising method. In addition, chiplet technology significantly amplifies the benefits of optical interconnects. However, existing optical networks do not take the modularity and flexible assembly of chiplets into account, nor do they take advantage of new fabrication and packaging. In this paper, we propose Saturn, an optical interconnection network architecture, including two parts: a core-to-memory network (CTMN) and a core-to-core network. In the CTMN, the integration of optical broadband micro-ring technology and co-designed wavelength assignment enables memory access to be completed in a single hop, providing highly parallel bandwidth. The serpentine layout employed in the CTMN eliminates waveguide crossings, which in turn substantially reduces the insertion loss and energy consumption. Analytical simulations have validated the effectiveness and efficiency of Saturn, showing that it can improve memory access throughput performance while achieving energy reduction compared with a traditional network.
{"title":"Saturn: a chiplet-based optical network architecture for breaking the memory wall","authors":"Lijing Zhu;Huaxi Gu;Kun Wang;Guangming Zhang","doi":"10.1364/JOCN.559347","DOIUrl":"https://doi.org/10.1364/JOCN.559347","url":null,"abstract":"Given the increasingly computing-intensive and data-intensive workloads of high-performance computing applications, the need for more cores and larger storage capacity is expanding. While computational power is rapidly increasing, data movement capability among cores and memory modules has not stepped forward substantially. Low energy efficiency and parallelism of data movement have become a bottleneck. Optical interconnects with better bandwidth and power performance are a promising method. In addition, chiplet technology significantly amplifies the benefits of optical interconnects. However, existing optical networks do not take the modularity and flexible assembly of chiplets into account, nor do they take advantage of new fabrication and packaging. In this paper, we propose Saturn, an optical interconnection network architecture, including two parts: a core-to-memory network (CTMN) and a core-to-core network. In the CTMN, the integration of optical broadband micro-ring technology and co-designed wavelength assignment enables memory access to be completed in a single hop, providing highly parallel bandwidth. The serpentine layout employed in the CTMN eliminates waveguide crossings, which in turn substantially reduces the insertion loss and energy consumption. Analytical simulations have validated the effectiveness and efficiency of Saturn, showing that it can improve memory access throughput performance while achieving energy reduction compared with a traditional network.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 8","pages":"713-719"},"PeriodicalIF":4.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144671222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves ${gt}6.5%$ higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.
{"title":"Interpretable optical network fault detection and localization with multi-task graph prototype learning","authors":"Xiaokang Chen;Xiaoliang Chen;Zuqing Zhu","doi":"10.1364/JOCN.562633","DOIUrl":"https://doi.org/10.1364/JOCN.562633","url":null,"abstract":"The recent advances in machine learning (ML) have promoted data-driven automated fault management in optical networks. However, existing ML-aided fault management approaches mainly rely on black-box models that lack intrinsic interpretability to secure their trustworthiness in mission-critical operation scenarios. In this paper, we propose an interpretable optical network fault detection and localization design leveraging multi-task graph prototype learning (MT-GPL). MT-GPL models an optical network and the optical performance monitoring data collected in it as graph-structured data and makes use of graph neural networks to learn graph embeddings that capture both topological correlations (for fault localization) and fault discriminative patterns (for root cause analysis). MT-GPL interprets its reasoning by (i) introducing a prototype layer that learns physics-aligned prototypes indicative of each fault class using the Monte Carlo tree search method and (ii) performing predictions based on the similarities between the embedding of an input graph and the learned prototypes. To enhance the scalability and interpretability of MT-GPL, we develop a multi-task architecture that performs concurrent fault localization and reasoning with node-level and device-level prototype learning and fault predictions. Performance evaluations show that our proposal achieves <tex>${gt}6.5%$</tex> higher prediction accuracy than the multi-layer perceptron model, while the visualizations of its reasoning processes verify the validity of its interpretability.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 9","pages":"D73-D82"},"PeriodicalIF":4.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In network-cloud ecosystems, large-scale failures affecting network carrier and datacenter (DC) infrastructures can severely disrupt cloud services. Post-disaster cloud service restoration requires cooperation among carriers and DC providers (DCPs) to minimize downtime. Such cooperation is challenging due to proprietary and regulatory policies, which limit access to confidential information (detailed topology, resource availability, etc.). Accordingly, we introduce a third-party entity, a provider-neutral exchange, which enables cooperation by sharing abstracted information. We formulate an optimization problem for DCP–carrier cooperation to maximize service restoration while minimizing restoration time and cost. We propose a scalable heuristic, demonstrating significant improvement in restoration efficiency with different topologies and failure scenarios.
{"title":"Post-disaster cloud-service restoration through datacenter-carrier cooperation","authors":"Subhadeep Sahoo;Sugang Xu;Sifat Ferdousi;Yusuke Hirota;Massimo Tornatore;Yoshinari Awaji;Biswanath Mukherjee","doi":"10.1364/JOCN.561579","DOIUrl":"https://doi.org/10.1364/JOCN.561579","url":null,"abstract":"In network-cloud ecosystems, large-scale failures affecting network carrier and datacenter (DC) infrastructures can severely disrupt cloud services. Post-disaster cloud service restoration requires cooperation among carriers and DC providers (DCPs) to minimize downtime. Such cooperation is challenging due to proprietary and regulatory policies, which limit access to confidential information (detailed topology, resource availability, etc.). Accordingly, we introduce a third-party entity, a provider-neutral exchange, which enables cooperation by sharing abstracted information. We formulate an optimization problem for DCP–carrier cooperation to maximize service restoration while minimizing restoration time and cost. We propose a scalable heuristic, demonstrating significant improvement in restoration efficiency with different topologies and failure scenarios.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"17 8","pages":"700-712"},"PeriodicalIF":4.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144646449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}