需要更多:需要系统作为非线性多目标强化学习

2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) Pub Date : 2020-10-26 DOI:10.1109/ICDL-EpiRob48136.2020.9278062

Matthias Rolf

{"title":"需要更多:需要系统作为非线性多目标强化学习","authors":"Matthias Rolf","doi":"10.1109/ICDL-EpiRob48136.2020.9278062","DOIUrl":null,"url":null,"abstract":"Both biological and artificial agents need to coordinate their behavior to suit various needs at the same time. Reconciling conflicts of different needs and contradictory interests such as self-preservation and curiosity is the central difficulty arising in the design and modelling of need and value systems. Current models of multi-objective reinforcement learning do either not provide satisfactory power to describe such conflicts, or lack the power to actually resolve them. This paper aims to promote a clear understanding of these limitations, and to overcome them with a theory-driven approach rather than ad hoc solutions. The first contribution of this paper is the development of an example that demonstrates previous approaches' limitations concisely. The second contribution is a new, non-linear objective function design, MORE, that addresses these and leads to a practical algorithm. Experiments show that standard RL methods fail to grasp the nature of the problem and ad-hoc solutions struggle to describe consistent preferences. MORE consistently learns a highly satisfactory solution that balances contradictory needs based on a consistent notion of optimality.","PeriodicalId":114948,"journal":{"name":"2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning\",\"authors\":\"Matthias Rolf\",\"doi\":\"10.1109/ICDL-EpiRob48136.2020.9278062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Both biological and artificial agents need to coordinate their behavior to suit various needs at the same time. Reconciling conflicts of different needs and contradictory interests such as self-preservation and curiosity is the central difficulty arising in the design and modelling of need and value systems. Current models of multi-objective reinforcement learning do either not provide satisfactory power to describe such conflicts, or lack the power to actually resolve them. This paper aims to promote a clear understanding of these limitations, and to overcome them with a theory-driven approach rather than ad hoc solutions. The first contribution of this paper is the development of an example that demonstrates previous approaches' limitations concisely. The second contribution is a new, non-linear objective function design, MORE, that addresses these and leads to a practical algorithm. Experiments show that standard RL methods fail to grasp the nature of the problem and ad-hoc solutions struggle to describe consistent preferences. MORE consistently learns a highly satisfactory solution that balances contradictory needs based on a consistent notion of optimality.\",\"PeriodicalId\":114948,\"journal\":{\"name\":\"2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDL-EpiRob48136.2020.9278062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDL-EpiRob48136.2020.9278062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

生物制剂和人工制剂都需要同时协调它们的行为以适应不同的需要。在需求和价值系统的设计和建模过程中，调和不同需求和相互矛盾的利益(如自我保护和好奇心)之间的冲突是最主要的困难。当前的多目标强化学习模型要么不能提供令人满意的能力来描述这些冲突，要么缺乏实际解决这些冲突的能力。本文旨在促进对这些限制的清晰理解，并通过理论驱动的方法而不是临时解决方案来克服它们。本文的第一个贡献是开发了一个示例，简明地展示了以前方法的局限性。第二个贡献是一个新的非线性目标函数设计，MORE，它解决了这些问题，并导致了一个实用的算法。实验表明，标准的强化学习方法无法把握问题的本质，临时解决方案难以描述一致的偏好。更一致地学习一个高度满意的解决方案，平衡矛盾的需求，基于一致的最优概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning

Both biological and artificial agents need to coordinate their behavior to suit various needs at the same time. Reconciling conflicts of different needs and contradictory interests such as self-preservation and curiosity is the central difficulty arising in the design and modelling of need and value systems. Current models of multi-objective reinforcement learning do either not provide satisfactory power to describe such conflicts, or lack the power to actually resolve them. This paper aims to promote a clear understanding of these limitations, and to overcome them with a theory-driven approach rather than ad hoc solutions. The first contribution of this paper is the development of an example that demonstrates previous approaches' limitations concisely. The second contribution is a new, non-linear objective function design, MORE, that addresses these and leads to a practical algorithm. Experiments show that standard RL methods fail to grasp the nature of the problem and ad-hoc solutions struggle to describe consistent preferences. MORE consistently learns a highly satisfactory solution that balances contradictory needs based on a consistent notion of optimality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)

自引率

0.00%

发文量