{"title":"强化学习作为群体觅食的预演","authors":"Nguyen, Trung, Banerjee, Bikramjit","doi":"10.1007/s11721-021-00203-8","DOIUrl":null,"url":null,"abstract":"<p>Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand-designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a hand-coded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem and experimentally show that a key component of RLaR—a conditional probability distribution function—can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.</p>","PeriodicalId":51284,"journal":{"name":"Swarm Intelligence","volume":"371 ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2021-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Reinforcement learning as a rehearsal for swarm foraging\",\"authors\":\"Nguyen, Trung, Banerjee, Bikramjit\",\"doi\":\"10.1007/s11721-021-00203-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand-designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a hand-coded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem and experimentally show that a key component of RLaR—a conditional probability distribution function—can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.</p>\",\"PeriodicalId\":51284,\"journal\":{\"name\":\"Swarm Intelligence\",\"volume\":\"371 \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2021-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11721-021-00203-8\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11721-021-00203-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reinforcement learning as a rehearsal for swarm foraging
Foraging in a swarm of robots has been investigated by many researchers, where the prevalent techniques have been hand-designed algorithms with parameters often tuned via machine learning. Our departure point is one such algorithm, where we replace a hand-coded decision procedure with reinforcement learning (RL), resulting in significantly superior performance. We situate our approach within the reinforcement learning as a rehearsal (RLaR) framework, that we have recently introduced. We instantiate RLaR for the foraging problem and experimentally show that a key component of RLaR—a conditional probability distribution function—can be modeled as a uni-modal distribution (with a lower memory footprint) despite evidence that it is multi-modal. Our experiments also show that the learned behavior has some degree of scalability in terms of variations in the swarm size or the environment.
期刊介绍:
Swarm Intelligence is the principal peer-reviewed publication dedicated to reporting on research
and developments in the multidisciplinary field of swarm intelligence. The journal publishes
original research articles and occasional review articles on theoretical, experimental and/or
practical aspects of swarm intelligence. All articles are published both in print and in electronic
form. There are no page charges for publication. Swarm Intelligence is published quarterly.
The field of swarm intelligence deals with systems composed of many individuals that coordinate
using decentralized control and self-organization. In particular, it focuses on the collective
behaviors that result from the local interactions of the individuals with each other and with their
environment. It is a fast-growing field that encompasses the efforts of researchers in multiple
disciplines, ranging from ethology and social science to operations research and computer
engineering.
Swarm Intelligence will report on advances in the understanding and utilization of swarm
intelligence systems, that is, systems that are based on the principles of swarm intelligence. The
following subjects are of particular interest to the journal:
• modeling and analysis of collective biological systems such as social insect colonies, flocking
vertebrates, and human crowds as well as any other swarm intelligence systems;
• application of biological swarm intelligence models to real-world problems such as distributed
computing, data clustering, graph partitioning, optimization and decision making;
• theoretical and empirical research in ant colony optimization, particle swarm optimization,
swarm robotics, and other swarm intelligence algorithms.