Genetic Multi-Armed Bandits: A Reinforcement Learning Inspired Approach for Simulation Optimization

IF 11.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Evolutionary Computation Pub Date : 2024-12-31 DOI:10.1109/TEVC.2024.3524505

Deniz Preil;Michael Krapp

{"title":"Genetic Multi-Armed Bandits: A Reinforcement Learning Inspired Approach for Simulation Optimization","authors":"Deniz Preil;Michael Krapp","doi":"10.1109/TEVC.2024.3524505","DOIUrl":null,"url":null,"abstract":"Many real-world problems are inherently stochastic, complicating, or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This article specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning (RL), we propose a novel memory concept for evolutionary algorithms that ensures global convergence and significantly improves their finite time performance. Unlike previous evolutionary algorithms, our approach permanently preserves simulation observations to progressively improve the accuracy of sample means when revisiting solutions in later iterations. Moreover, the selection of new solutions is based on the entire memory rather than just the last population. The numerical experiments demonstrate that this novel approach, which combines a genetic algorithm (GA) with such memory, consistently outperforms popular convergent state-of-the-art benchmark algorithms in a large variety of established test problems while requiring considerably less computational effort. This marks the so-called genetic multi-armed bandit (MAB) as one of the currently most powerful algorithms for solving stochastic problems.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 2","pages":"360-374"},"PeriodicalIF":11.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818791","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818791/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Many real-world problems are inherently stochastic, complicating, or even precluding the use of analytical methods. These problems are often characterized by high dimensionality, large solution spaces, and numerous local optima, which make finding optimal solutions challenging. Therefore, simulation optimization is frequently employed. This article specifically focuses on the discrete case, also known as discrete optimization via simulation. Despite their adaptions for stochastic problems, previous evolutionary algorithms face a major limitation in these problems. They discard all information about solutions that are not involved in the most recent population. However, this is ineffective, as each simulation observation gathered over the course of iterations provides valuable information that should guide the selection of subsequent solutions. Inspired by the domain of reinforcement learning (RL), we propose a novel memory concept for evolutionary algorithms that ensures global convergence and significantly improves their finite time performance. Unlike previous evolutionary algorithms, our approach permanently preserves simulation observations to progressively improve the accuracy of sample means when revisiting solutions in later iterations. Moreover, the selection of new solutions is based on the entire memory rather than just the last population. The numerical experiments demonstrate that this novel approach, which combines a genetic algorithm (GA) with such memory, consistently outperforms popular convergent state-of-the-art benchmark algorithms in a large variety of established test problems while requiring considerably less computational effort. This marks the so-called genetic multi-armed bandit (MAB) as one of the currently most powerful algorithms for solving stochastic problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

遗传多武装强盗：一种受强化学习启发的仿真优化方法

许多现实世界的问题本质上是随机的，复杂的，甚至排除了分析方法的使用。这些问题通常具有高维、大解空间和大量局部最优的特点，这使得寻找最优解具有挑战性。因此，经常采用仿真优化。本文特别关注离散情况，也称为通过模拟进行离散优化。尽管以前的进化算法能够适应随机问题，但在这些问题上存在很大的局限性。它们丢弃了所有与最近种群无关的解的信息。然而，这是无效的，因为在迭代过程中收集的每个模拟观察都提供了有价值的信息，这些信息应该指导后续解决方案的选择。受强化学习（RL）领域的启发，我们提出了一种新的进化算法的记忆概念，该概念确保了全局收敛并显着提高了其有限时间性能。与以前的进化算法不同，我们的方法永久地保留了模拟观测值，以便在以后的迭代中重访解决方案时逐步提高样本均值的准确性。此外，新的解决方案的选择是基于整个记忆，而不仅仅是最后的人口。数值实验表明，这种将遗传算法（GA）与这种存储器相结合的新方法在各种已建立的测试问题中始终优于流行的收敛的最先进的基准算法，同时所需的计算量大大减少。这标志着所谓的遗传多臂强盗（MAB）是目前解决随机问题最强大的算法之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Evolutionary Computation 工程技术-计算机：理论方法

CiteScore

21.90

自引率

9.80%

发文量

196

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.