{"title":"Genetic Multi-Armed Bandits: A Reinforcement Learning Inspired Approach for Simulation Optimization","authors":"Deniz Preil;Michael Krapp","doi":"10.1109/TEVC.2024.3524505","DOIUrl":null,"url":null,"abstract":"Many real-world problems are inherently stochastic, complicating, or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This article specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning (RL), we propose a novel memory concept for evolutionary algorithms that ensures global convergence and significantly improves their finite time performance. Unlike previous evolutionary algorithms, our approach permanently preserves simulation observations to progressively improve the accuracy of sample means when revisiting solutions in later iterations. Moreover, the selection of new solutions is based on the entire memory rather than just the last population. The numerical experiments demonstrate that this novel approach, which combines a genetic algorithm (GA) with such memory, consistently outperforms popular convergent state-of-the-art benchmark algorithms in a large variety of established test problems while requiring considerably less computational effort. This marks the so-called genetic multi-armed bandit (MAB) as one of the currently most powerful algorithms for solving stochastic problems.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 2","pages":"360-374"},"PeriodicalIF":11.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818791","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818791/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Many real-world problems are inherently stochastic, complicating, or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This article specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning (RL), we propose a novel memory concept for evolutionary algorithms that ensures global convergence and significantly improves their finite time performance. Unlike previous evolutionary algorithms, our approach permanently preserves simulation observations to progressively improve the accuracy of sample means when revisiting solutions in later iterations. Moreover, the selection of new solutions is based on the entire memory rather than just the last population. The numerical experiments demonstrate that this novel approach, which combines a genetic algorithm (GA) with such memory, consistently outperforms popular convergent state-of-the-art benchmark algorithms in a large variety of established test problems while requiring considerably less computational effort. This marks the so-called genetic multi-armed bandit (MAB) as one of the currently most powerful algorithms for solving stochastic problems.
期刊介绍:
The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.