{"title":"基于交叉强化学习的海底光缆路由规划多目标优化","authors":"Zanshan Zhao;Guanjun Gao;Weiming Gan;Jialiang Zhang;Zengfu Wang;Haoyu Wang;Yonggang Guo","doi":"10.1364/JOCN.529175","DOIUrl":null,"url":null,"abstract":"Submarine cable is a crucial infrastructure for international communications, and its cost and survivability are two key factors that must be considered at its design phase. In this paper, we propose a machine-learning-assisted submarine cable route planning algorithm for minimizing its accumulated cost and risk. The cost and risk distribution and the direction of the submarine cable route’s starting point and endpoint are used as prior data to initialize the state-action of reinforcement learning (RL). We also propose a multi-agent cross reinforcement learning (MA-XRL) framework composed of Q-learning and SARSA to improve the global optimization capability of RL in the case of multi-objective optimization. The results show that, compared to ant colony optimization (ACO), MA-XRL can reduce the accumulated cost by 26.87% under the same accumulated risk. The maximum accumulated cost of the Pareto solutions obtained by MA-XRL is lower than the minimum accumulated cost of that obtained by ACO. Meanwhile, the running time of MA-XRL is only 1.3‰ of that of ACO. Without prior data of cost and risk initialization, the accumulated cost and risk of the best submarine cable route obtained by MA-XRL is 1.84 times and 7.08 times those with cost and risk distribution initialization, respectively. The direction initialization can accelerate the agent to find the endpoint of the submarine cable route and double the search stability of MA-XRL. Compared to using Q-learning or SARSA alone, MA-XRL can respectively reduce the accumulated risk by 71.81% and 39.51% under the same accumulated cost and can reduce the accumulated cost by 16.65% and 11.99% under the same accumulated risk, respectively.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":"16 10","pages":"1018-1033"},"PeriodicalIF":4.0000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-objective optimization for submarine optical cable route planning based on cross reinforcement learning\",\"authors\":\"Zanshan Zhao;Guanjun Gao;Weiming Gan;Jialiang Zhang;Zengfu Wang;Haoyu Wang;Yonggang Guo\",\"doi\":\"10.1364/JOCN.529175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Submarine cable is a crucial infrastructure for international communications, and its cost and survivability are two key factors that must be considered at its design phase. In this paper, we propose a machine-learning-assisted submarine cable route planning algorithm for minimizing its accumulated cost and risk. The cost and risk distribution and the direction of the submarine cable route’s starting point and endpoint are used as prior data to initialize the state-action of reinforcement learning (RL). We also propose a multi-agent cross reinforcement learning (MA-XRL) framework composed of Q-learning and SARSA to improve the global optimization capability of RL in the case of multi-objective optimization. The results show that, compared to ant colony optimization (ACO), MA-XRL can reduce the accumulated cost by 26.87% under the same accumulated risk. The maximum accumulated cost of the Pareto solutions obtained by MA-XRL is lower than the minimum accumulated cost of that obtained by ACO. Meanwhile, the running time of MA-XRL is only 1.3‰ of that of ACO. Without prior data of cost and risk initialization, the accumulated cost and risk of the best submarine cable route obtained by MA-XRL is 1.84 times and 7.08 times those with cost and risk distribution initialization, respectively. The direction initialization can accelerate the agent to find the endpoint of the submarine cable route and double the search stability of MA-XRL. Compared to using Q-learning or SARSA alone, MA-XRL can respectively reduce the accumulated risk by 71.81% and 39.51% under the same accumulated cost and can reduce the accumulated cost by 16.65% and 11.99% under the same accumulated risk, respectively.\",\"PeriodicalId\":50103,\"journal\":{\"name\":\"Journal of Optical Communications and Networking\",\"volume\":\"16 10\",\"pages\":\"1018-1033\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Optical Communications and Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10694705/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10694705/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Multi-objective optimization for submarine optical cable route planning based on cross reinforcement learning
Submarine cable is a crucial infrastructure for international communications, and its cost and survivability are two key factors that must be considered at its design phase. In this paper, we propose a machine-learning-assisted submarine cable route planning algorithm for minimizing its accumulated cost and risk. The cost and risk distribution and the direction of the submarine cable route’s starting point and endpoint are used as prior data to initialize the state-action of reinforcement learning (RL). We also propose a multi-agent cross reinforcement learning (MA-XRL) framework composed of Q-learning and SARSA to improve the global optimization capability of RL in the case of multi-objective optimization. The results show that, compared to ant colony optimization (ACO), MA-XRL can reduce the accumulated cost by 26.87% under the same accumulated risk. The maximum accumulated cost of the Pareto solutions obtained by MA-XRL is lower than the minimum accumulated cost of that obtained by ACO. Meanwhile, the running time of MA-XRL is only 1.3‰ of that of ACO. Without prior data of cost and risk initialization, the accumulated cost and risk of the best submarine cable route obtained by MA-XRL is 1.84 times and 7.08 times those with cost and risk distribution initialization, respectively. The direction initialization can accelerate the agent to find the endpoint of the submarine cable route and double the search stability of MA-XRL. Compared to using Q-learning or SARSA alone, MA-XRL can respectively reduce the accumulated risk by 71.81% and 39.51% under the same accumulated cost and can reduce the accumulated cost by 16.65% and 11.99% under the same accumulated risk, respectively.
期刊介绍:
The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.