Evaluation of Q-learning for search and inspect missions using underwater vehicles

G. Frost, D. Lane
{"title":"Evaluation of Q-learning for search and inspect missions using underwater vehicles","authors":"G. Frost, D. Lane","doi":"10.1109/OCEANS.2014.7003088","DOIUrl":null,"url":null,"abstract":"An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.","PeriodicalId":368693,"journal":{"name":"2014 Oceans - St. John's","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Oceans - St. John's","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCEANS.2014.7003088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

An application for offline Reinforcement Learning in the underwater domain is proposed. We present and evaluate the integration of the Q-learning algorithm into an Autonomous Underwater Vehicle (AUV) for learning the action-value function in simulation. Three separate experiments are presented. The first compares two search policies: the ε - least visited, and random action, with respect to convergence time. The second experiment presents the effect of the learning discount factor, gamma, on the convergence time of the ε - least visited search policy. The final experiment is to validate the use of a policy learnt offline on a real AUV. This learning phase occurs offline within the continuous simulation environment which had been discretized into a grid-world learning problem. Presented results show the system's convergence to a global optimal solution whilst following both sub-optimal policies during simulation. Future work is introduced, after discussion of our results, to enable the system to be used in a real world application. The results presented, therefore, form the basis for future comparative analysis of the necessary improvements such as function approximation of the state space.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用水下航行器进行搜索和检查任务的q -学习评估
提出了离线强化学习在水下领域的应用。我们提出并评估了将q -学习算法集成到自主水下航行器(AUV)中,用于模拟中动作值函数的学习。提出了三个独立的实验。第一个比较了两种搜索策略:ε -最少访问和随机行为,相对于收敛时间。第二个实验展示了学习折扣因子(gamma)对ε -最小访问搜索策略收敛时间的影响。最后的实验是验证离线学习策略在真实AUV上的使用。这一学习阶段发生在连续仿真环境下的离线状态下,该环境被离散化为网格世界的学习问题。仿真结果表明,系统在遵循两个次优策略的同时收敛到全局最优解。在讨论了我们的结果之后,介绍了未来的工作,以使系统能够在现实世界的应用中使用。因此,所提出的结果为将来对必要的改进(如状态空间的函数逼近)进行比较分析奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A new segmentation approach for unimodal image histograms: Application to the detection of regions of interest in sonar images Electronic navigational chart generator for a marine mobile augmented reality system A hybrid registration approach combining SLAM and elastic matching for automatic side-scan sonar mosaic Unsupervised knowledge discovery of seabed types using competitive neural network: Application to sidescan sonar images European multidisciplinary seafloor and water-column observatory (EMSO): Power and Internet to European waters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1