首页 > 最新文献

2017 International Conference on Progress in Informatics and Computing (PIC)最新文献

英文 中文
Playing games with reinforcement learning via perceiving orientation and exploring diversity 通过感知取向和探索多样性来玩强化学习游戏
Pub Date : 2017-12-01 DOI: 10.1109/PIC.2017.8359509
Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu
The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.
强化学习可以指导智能体在各种复杂环境下的最佳表现。尽管强化学习在许多领域带来了突破,但它们受到两个瓶颈的制约:极度延迟的奖励信号和多样性与速度之间的权衡。在本文中,我们提出了一个新的框架来缓解这两个瓶颈。对于延迟奖励,我们引入了一个新的术语,称为取向感知术语,用于计算每个状态的奖励。对于成功到达目标状态的一系列动作,这个术语对每个状态都取一个差值,并给路径上的所有状态分配奖励,而不是只给目标状态提供奖励。该机制允许学习算法通过区分不同的状态来感知方向信息。为了权衡多样性与速度之间的关系,我们将课程学习融入到探索过程中,提出了多样性探索方案。一开始,该方案倾向于探索未执行的动作,以发现最优的动作系列。随着学习过程的进行,该方案逐渐依赖于所学知识,降低了随机概率。这种随机性到确定性的多样性探索方案指导学习方案在策略多样性和收敛速度之间达到适当的平衡。我们将完整的框架命名为OpDe强化学习,并证明了算法的收敛性。在标准平台上的实验验证了整个框架的有效性。
{"title":"Playing games with reinforcement learning via perceiving orientation and exploring diversity","authors":"Dong Zhang, Le Yang, Haobin Shi, Fangqing Mou, Mengkai Hu","doi":"10.1109/PIC.2017.8359509","DOIUrl":"https://doi.org/10.1109/PIC.2017.8359509","url":null,"abstract":"The reinforcement learning can guide the agents to perform optimally under various complex environments. Although reinforcement learning has brought breakthrough for many domains, they are constrained by two bottlenecks: extremely delayed reward signal and the trade-off between diversity and speed. In this paper, we propose a novel framework to alleviate those two bottlenecks. For the delayed reward, we introduce a new term, named the orientation perception term, to calculate the award for each state. For a series of actions successfully leading to the target state, this term takes a difference to each state and assigns award to all states on the pathway, rather than only offers award to the target state. This mechanism allows the learning algorithm to percept the orientation information by distinguishing different states. For the trade-off between diversity and speed, we integrate the curriculum learning into the exploration process and propose the diversity exploration scheme. In the beginning, this scheme is prone to exploring the unexecuted action so as to discover the optimal action series. With the learning process carrying on, the scheme gradually relays more on the acquired knowledge and reduces the random probability. Such randomicity to certainty diversity exploration scheme guides the learning scheme to achieve proper balance between strategy diversity and convergency speed. We name the complete framework OpDe Reinforcement Learning and prove the algorithm convergence. Experiments on a standard platform demonstrate the effectiveness of the complete framework.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128575068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2017 International Conference on Progress in Informatics and Computing (PIC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1