Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan
{"title":"Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms","authors":"Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan","doi":"10.1049/tje2.12303","DOIUrl":null,"url":null,"abstract":"Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .","PeriodicalId":22858,"journal":{"name":"The Journal of Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/tje2.12303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
旅行推销员问题的强化学习:三种算法的性能比较
旅行商问题(TSP)是图论中最著名的问题之一,也是组合优化中典型的非确定性多项式时间(NP)困难问题之一。强化学习(RL)已被广泛认为是解决组合优化问题的有效工具。本文尝试使用不同的强化学习算法来解决TSP问题,并评估了三种强化学习算法(Q - learning、SARSA和Double Q - learning)在不同奖励函数、ε -贪婪衰减策略和运行时间下的性能。在实验中对上述三种算法进行了全面的分析和比较。首先,实验结果表明双Q - Learning算法是最好的算法。在8个TSP实例中,双Q - Learning算法在5个实例中优于其他两种算法。此外,与SARSA算法相比,它具有更短的运行时间,并且与Q - Learning算法在所有实例中的运行时间相似。其次,通过对结果的分析,发现使用奖励策略有助于获得所有算法的最佳结果。在3种算法和8个实例的24种组合中,当奖励策略设置为时,有17种组合获得了最好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A novel jittered‐carrier phase‐shifted sine pulse width modulation for cascaded H‐bridge converter An improved hybrid network‐on‐chip with flexible topology and frugal routing Magnetic sensors for contactless and non‐intrusive measurement of current in AC power systems Regulation of mixed convective flow in a horizontal channel with multiple slots using P, PI, and PID controllers BrutNet: A novel approach for violence detection and classification using DCNN with GRU
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1