Evaluation of Q-learning for search and inspect missions using underwater vehicles

2014 Oceans - St. John's Pub Date : 2014-09-01 DOI:10.1109/OCEANS.2014.7003088

G. Frost, D. Lane

{"title":"Evaluation of Q-learning for search and inspect missions using underwater vehicles","authors":"G. Frost, D. Lane","doi":"10.1109/OCEANS.2014.7003088","DOIUrl":null,"url":null,"abstract":"An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.","PeriodicalId":368693,"journal":{"name":"2014 Oceans - St. John's","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Oceans - St. John's","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCEANS.2014.7003088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用水下航行器进行搜索和检查任务的q -学习评估

提出了离线强化学习在水下领域的应用。我们提出并评估了将q -学习算法集成到自主水下航行器(AUV)中，用于模拟中动作值函数的学习。提出了三个独立的实验。第一个比较了两种搜索策略:ε -最少访问和随机行为，相对于收敛时间。第二个实验展示了学习折扣因子(gamma)对ε -最小访问搜索策略收敛时间的影响。最后的实验是验证离线学习策略在真实AUV上的使用。这一学习阶段发生在连续仿真环境下的离线状态下，该环境被离散化为网格世界的学习问题。仿真结果表明，系统在遵循两个次优策略的同时收敛到全局最优解。在讨论了我们的结果之后，介绍了未来的工作，以使系统能够在现实世界的应用中使用。因此，所提出的结果为将来对必要的改进(如状态空间的函数逼近)进行比较分析奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 Oceans - St. John's

自引率

0.00%

发文量