{"title":"Social satisficing: Multi-agent reinforcement learning with satisficing agents","authors":"Daisuke Uragami , Noriaki Sonota , Tatsuji Takahashi","doi":"10.1016/j.biosystems.2024.105276","DOIUrl":null,"url":null,"abstract":"<div><p>For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisficing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.</p></div>","PeriodicalId":50730,"journal":{"name":"Biosystems","volume":"243 ","pages":"Article 105276"},"PeriodicalIF":2.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0303264724001618/pdfft?md5=1013f746e0723d63b95dde32bc8a58b3&pid=1-s2.0-S0303264724001618-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0303264724001618","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
For a reinforcement learning agent to finish trial-and-error in a realistic time duration, it is necessary to limit the scope of exploration during the learning process. However, limiting the exploration scope means limitation in optimality: the agent could fall into a suboptimal solution. This is the nature of local, bottom-up way of learning. An alternative way is to set a goal to be achieved, which is a more global, top-down way. The risk-sensitive satisficing (RS) value function incorporate, as a method of the latter way, the satisficing principle into reinforcement learning and enables agents to quickly converge to exploiting the optimal solution without falling into a suboptimal one, when an appropriate goal (aspiration level) is given. However, how best to determine the aspiration level is still an open problem. This study proposes social satisficing, a framework for multi-agent reinforcement learning which determines the aspiration level through information sharing among multiple agents. In order to verify the effectiveness of this novel method, we conducted simulations in a learning environment with many suboptimal goals (SuboptimaWorld). The results show that the proposed method, which converts the aspiration level at the episodic level into local (state-wise) aspiration levels, possesses a higher learning efficiency than any of the compared methods, and that the novel method has the ability to autonomously adjust exploration scope, while keeping the shared information minimal. This study provides a glimpse into an aspect of human and biological sociality which has been mentioned little in the context of artificial intelligence and machine learning.
期刊介绍:
BioSystems encourages experimental, computational, and theoretical articles that link biology, evolutionary thinking, and the information processing sciences. The link areas form a circle that encompasses the fundamental nature of biological information processing, computational modeling of complex biological systems, evolutionary models of computation, the application of biological principles to the design of novel computing systems, and the use of biomolecular materials to synthesize artificial systems that capture essential principles of natural biological information processing.