Social satisficing: Multi-agent reinforcement learning with satisﬁcing agents

IF 2 4区生物学 Q2 BIOLOGY Biosystems Pub Date : 2024-07-19 DOI:10.1016/j.biosystems.2024.105276

Daisuke Uragami , Noriaki Sonota , Tatsuji Takahashi

{"title":"Social satisficing: Multi-agent reinforcement learning with satisﬁcing agents","authors":"Daisuke Uragami , Noriaki Sonota , Tatsuji Takahashi","doi":"10.1016/j.biosystems.2024.105276","DOIUrl":null,"url":null,"abstract":"<div><p>For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisﬁcing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.</p></div>","PeriodicalId":50730,"journal":{"name":"Biosystems","volume":"243 ","pages":"Article 105276"},"PeriodicalIF":2.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0303264724001618/pdfft?md5=1013f746e0723d63b95dde32bc8a58b3&pid=1-s2.0-S0303264724001618-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0303264724001618","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisﬁcing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

社会满意：满足型代理的多代理强化学习

要让强化学习代理在现实时间内完成试错，就必须限制学习过程中的探索范围。然而，限制探索范围意味着限制优化：代理可能会陷入次优解。这就是自下而上的局部学习方式的本质。另一种方法是设定一个要实现的目标，这是一种更具全局性、自上而下的方法。作为后一种方式的一种方法，风险敏感的满足（RS）价值函数将满足原则纳入强化学习中，并在给出适当目标（期望水平）时，使代理能够迅速收敛到利用最优解，而不会陷入次优解。然而，如何更好地确定期望水平仍是一个未决问题。本研究提出了一种多代理强化学习框架--社会满意度（social satisficing），它通过多个代理之间的信息共享来确定期望水平。为了验证这种新方法的有效性，我们在一个有许多次优目标的学习环境（SuboptimaWorld）中进行了模拟。结果表明，所提出的方法能将偶发水平上的愿望水平转换为局部（状态上的）愿望水平，其学习效率高于任何一种比较方法，而且这种新方法有能力自主调整探索范围，同时将共享信息保持在最低水平。这项研究让我们看到了人类和生物社会性的一个方面，而这个方面在人工智能和机器学习中很少被提及。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biosystems 生物-生物学

CiteScore

3.70

自引率

18.80%

发文量

129

审稿时长

34 days

期刊介绍： BioSystems encourages experimental, computational, and theoretical articles that link biology, evolutionary thinking, and the information processing sciences. The link areas form a circle that encompasses the fundamental nature of biological information processing, computational modeling of complex biological systems, evolutionary models of computation, the application of biological principles to the design of novel computing systems, and the use of biomolecular materials to synthesize artificial systems that capture essential principles of natural biological information processing.

期刊最新文献

Anti-wetting wing surface characteristics of a water bug, Diplonychus annulatus. The regulatory network that controls lymphopoiesis. The concepts of code biology. On the infodynamics of ramifications in constructal design. Cholesterol-ester prevents lipoprotein core from solidifying: Molecular dynamics simulation.