在深度强化学习器中构建动作集

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00081

Yongzhao Wang, Arunesh Sinha, Sky CH-Wang, Michael P. Wellman

{"title":"在深度强化学习器中构建动作集","authors":"Yongzhao Wang, Arunesh Sinha, Sky CH-Wang, Michael P. Wellman","doi":"10.1109/ICMLA52953.2021.00081","DOIUrl":null,"url":null,"abstract":"In many policy-learning applications, the agent may execute a set of actions at each decision stage. Choosing among an exponential number of alternatives poses a computational challenge, and even representing actions naturally expressed as sets can be a tricky design problem. Building upon prior approaches that employ deep neural networks and iterative construction of action sets, we introduce a reward-shaping approach to apportion reward to each atomic action based on its marginal contribution within an action set, thereby providing useful feedback for learning to build these sets. We demonstrate our method in two environments where action spaces are combinatorial. Experiments reveal that our method significantly accelerates and stabilizes policy learning with combinatorial actions.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"484-489"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Building Action Sets in a Deep Reinforcement Learner\",\"authors\":\"Yongzhao Wang, Arunesh Sinha, Sky CH-Wang, Michael P. Wellman\",\"doi\":\"10.1109/ICMLA52953.2021.00081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many policy-learning applications, the agent may execute a set of actions at each decision stage. Choosing among an exponential number of alternatives poses a computational challenge, and even representing actions naturally expressed as sets can be a tricky design problem. Building upon prior approaches that employ deep neural networks and iterative construction of action sets, we introduce a reward-shaping approach to apportion reward to each atomic action based on its marginal contribution within an action set, thereby providing useful feedback for learning to build these sets. We demonstrate our method in two environments where action spaces are combinatorial. Experiments reveal that our method significantly accelerates and stabilizes policy learning with combinatorial actions.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"42 1\",\"pages\":\"484-489\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在许多策略学习应用程序中，代理可能在每个决策阶段执行一组操作。在指数级的备选方案中进行选择是一项计算挑战，甚至将自然表达为集合的动作也可能是一个棘手的设计问题。在先前使用深度神经网络和动作集迭代构建的方法的基础上，我们引入了一种奖励塑造方法，根据每个原子动作在动作集中的边际贡献来分配奖励，从而为学习构建这些集合提供有用的反馈。我们在两个动作空间是组合的环境中演示了我们的方法。实验表明，我们的方法可以显著地加速和稳定组合行为下的策略学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Building Action Sets in a Deep Reinforcement Learner

In many policy-learning applications, the agent may execute a set of actions at each decision stage. Choosing among an exponential number of alternatives poses a computational challenge, and even representing actions naturally expressed as sets can be a tricky design problem. Building upon prior approaches that employ deep neural networks and iterative construction of action sets, we introduce a reward-shaping approach to apportion reward to each atomic action based on its marginal contribution within an action set, thereby providing useful feedback for learning to build these sets. We demonstrate our method in two environments where action spaces are combinatorial. Experiments reveal that our method significantly accelerates and stabilizes policy learning with combinatorial actions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量

期刊最新文献

Detecting Offensive Content on Twitter During Proud Boys Riots Explainable Zero-Shot Modelling of Clinical Depression Symptoms from Text Deep Learning Methods for the Prediction of Information Display Type Using Eye Tracking Sequences Step Detection using SVM on NURVV Trackers Condition Monitoring for Power Converters via Deep One-Class Classification