Self organizing optimization and phase transition in reinforcement learning minority game system

IF 6.5 2区 物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY Frontiers of Physics Pub Date : 2024-01-24 DOI:10.1007/s11467-023-1378-z
Si-Ping Zhang, Jia-Qi Dong, Hui-Yu Zhang, Yi-Xuan Lü, Jue Wang, Zi-Gang Huang
{"title":"Self organizing optimization and phase transition in reinforcement learning minority game system","authors":"Si-Ping Zhang,&nbsp;Jia-Qi Dong,&nbsp;Hui-Yu Zhang,&nbsp;Yi-Xuan Lü,&nbsp;Jue Wang,&nbsp;Zi-Gang Huang","doi":"10.1007/s11467-023-1378-z","DOIUrl":null,"url":null,"abstract":"<div><p>Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears.\n</p><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":573,"journal":{"name":"Frontiers of Physics","volume":"19 4","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Physics","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s11467-023-1378-z","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
强化学习少数民族游戏系统中的自组织优化和阶段转换
由大量人工智能(AI)代理组成的复杂博弈系统在强化学习的加持下,能否仅通过代理自我探索的方式产生极为有利的集体行为,是一个具有重要现实意义的问题。本文通过将资源分配系统的典型理论模型--少数人博弈模型与强化学习相结合来解决这一问题。基于强化学习算法,每个参与博弈的个体都被设定为具有一定程度的智能。我们特别证明,当人工智能代理逐渐熟悉未知环境并试图提供最优行动以获得最大回报时,整个系统会在特定参数组合下不断接近最优状态,羊群行为会被一种振荡的集体行为有效抑制,而这种振荡的集体行为是一种不受任何外部干扰的自组织模式。一个有趣的现象是,基于强化学习的多代理系统的一些数值结果显示了一阶相变。为了进一步理解代理学习的动态行为,我们定义并分析了信念模式的转换路径,发现在给定的试错率下,人工智能系统出现了信念模式的自组织凝聚。最后,我们提供了一种基于库尔贝-莱布勒发散的周期-2振荡集体模式出现的检测方法,并给出了周期-2出现的参数位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers of Physics
Frontiers of Physics PHYSICS, MULTIDISCIPLINARY-
CiteScore
9.20
自引率
9.30%
发文量
898
审稿时长
6-12 weeks
期刊介绍: Frontiers of Physics is an international peer-reviewed journal dedicated to showcasing the latest advancements and significant progress in various research areas within the field of physics. The journal's scope is broad, covering a range of topics that include: Quantum computation and quantum information Atomic, molecular, and optical physics Condensed matter physics, material sciences, and interdisciplinary research Particle, nuclear physics, astrophysics, and cosmology The journal's mission is to highlight frontier achievements, hot topics, and cross-disciplinary points in physics, facilitating communication and idea exchange among physicists both in China and internationally. It serves as a platform for researchers to share their findings and insights, fostering collaboration and innovation across different areas of physics.
期刊最新文献
Erratum to: Noisy intermediate-scale quantum computers Strong ferroelectricity in one-dimensional materials self-assembled by superatomic metal halide clusters Bayesian method for fitting the low-energy constants in chiral perturbation theory Interlayer ferromagnetic coupling in nonmagnetic elements doped CrI3 thin films Magnon, doublon and quarton excitations in 2D S=1/2 trimerized Heisenberg models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1