{"title":"Social satisficing: Multi-agent reinforcement learning with satisficing agents","authors":"Daisuke Uragami , Noriaki Sonota , Tatsuji Takahashi","doi":"10.1016/j.biosystems.2024.105276","DOIUrl":null,"url":null,"abstract":"<div><p>For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisficing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0303264724001618/pdfft?md5=1013f746e0723d63b95dde32bc8a58b3&pid=1-s2.0-S0303264724001618-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0303264724001618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisficing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.