在Atari库中实现人类级别的安全强化学习

Afriyadi Afriyadi, Wiranto Herry Utomo
{"title":"在Atari库中实现人类级别的安全强化学习","authors":"Afriyadi Afriyadi, Wiranto Herry Utomo","doi":"10.32736/sisfokom.v12i3.1739","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.","PeriodicalId":34309,"journal":{"name":"Jurnal Sisfokom","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Human-Level Safe Reinforcement Learning in Atari Library\",\"authors\":\"Afriyadi Afriyadi, Wiranto Herry Utomo\",\"doi\":\"10.32736/sisfokom.v12i3.1739\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.\",\"PeriodicalId\":34309,\"journal\":{\"name\":\"Jurnal Sisfokom\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Sisfokom\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32736/sisfokom.v12i3.1739\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sisfokom","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32736/sisfokom.v12i3.1739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)是训练智能体执行复杂任务的有力工具。然而,RL代理经常会以不安全或意想不到的方式学习行为。在探索阶段尤其如此,当智能体试图了解其环境时。本研究从机器人领域获得了安全探索方法,并将其与其他算法进行了比较,这些算法通常用于复杂的视频游戏环境中,没有安全探索。我们还提出了一种手工制作灾难性状态的方法,这些状态是已知的代理无法访问的不安全状态。我们的结果表明,在相对确定的迭代中,我们的方法和手工制作的安全约束优于最先进的算法。这意味着我们的方法能够在学习安全行为的同时获得良好的性能。这些结果对人类水平的安全学习的未来发展具有启示意义,结合基于模型的强化学习使用复杂的视频游戏环境。通过开发安全的探索方法,我们可以帮助确保强化学习代理可以用于各种现实世界的应用,例如自动驾驶汽车和机器人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Human-Level Safe Reinforcement Learning in Atari Library
Reinforcement learning (RL) is a powerful tool for training agents to perform complex tasks. However, from time-to-time RL agents often learn to behave in unsafe or unintended ways. This is especially true during the exploration phase, when the agent is trying to learn about its environment. This research acquires safe exploration methods from the field of robotics and evaluates their effectiveness compared to other algorithms that are commonly used in complex videogame environments without safe exploration. We also propose a method for hand-crafting catastrophic states, which are states that are known to be unsafe for the agent to visit. Our results show that our method and our hand-crafted safety constraints outperform state-of-the-art algorithms on relatively certain iterations. This means that our method is able to learn to behave safely while still achieving good performance. These results have implications for the future development of human-level safe learning with combination of model-based RL using complex videogame environments. By developing safe exploration methods, we can help to ensure that RL agents can be used in a variety of real-world applications, such as self-driving cars and robotics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
40
审稿时长
8 weeks
期刊最新文献
Identifying Credit Card Fraud in Illegal Transactions Using Random Forest and Decision Tree Algorithms Determining Scholarship Recipients at STIT Prabumulih Using the AHP Method Determining Promotional Package Recommendations Using the Frequent Pattern Growth Algorithm at The Java Cafe Systematic Literature Review: Machine Learning Methods in Emotion Classification in Textual Data Heart Chamber Segmentation in Cardiomegaly Conditions Using the CNN Method with U-Net Architecture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1