Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

The Journal of Engineering Pub Date : 2023-09-01 DOI:10.1049/tje2.12303

Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan

{"title":"Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms","authors":"Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan","doi":"10.1049/tje2.12303","DOIUrl":null,"url":null,"abstract":"Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .","PeriodicalId":22858,"journal":{"name":"The Journal of Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/tje2.12303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

旅行推销员问题的强化学习:三种算法的性能比较

旅行商问题(TSP)是图论中最著名的问题之一，也是组合优化中典型的非确定性多项式时间(NP)困难问题之一。强化学习(RL)已被广泛认为是解决组合优化问题的有效工具。本文尝试使用不同的强化学习算法来解决TSP问题，并评估了三种强化学习算法(Q - learning、SARSA和Double Q - learning)在不同奖励函数、ε -贪婪衰减策略和运行时间下的性能。在实验中对上述三种算法进行了全面的分析和比较。首先，实验结果表明双Q - Learning算法是最好的算法。在8个TSP实例中，双Q - Learning算法在5个实例中优于其他两种算法。此外，与SARSA算法相比，它具有更短的运行时间，并且与Q - Learning算法在所有实例中的运行时间相似。其次，通过对结果的分析，发现使用奖励策略有助于获得所有算法的最佳结果。在3种算法和8个实例的24种组合中，当奖励策略设置为时，有17种组合获得了最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Journal of Engineering

自引率

0.00%

发文量