{"title":"Optimizing sequential decision-making under risk: Strategic allocation with switching penalties","authors":"Milad Malekipirbazari","doi":"10.1016/j.ejor.2024.09.023","DOIUrl":null,"url":null,"abstract":"<div><div>This paper considers the multiarmed bandit (MAB) problem augmented with a critical real-world consideration: the cost implications of switching decisions. Our work distinguishes itself by addressing the largely unexplored domain of risk-averse MAB problems compounded by switching penalties. Such scenarios are not just theoretical constructs but are reflective of numerous practical applications. Our contribution is threefold: firstly, we explore how switching costs and risk aversion influence decision-making in MAB problems. Secondly, we present novel theoretical results, including the development of the Risk-Averse Switching Index (RASI), which addresses the dual challenges of risk aversion and switching costs, demonstrating its near-optimal efficacy. This heuristic solution method is grounded in dynamic coherent risk measures, enabling a time-consistent evaluation of risk and reward. Lastly, through rigorous numerical experiments, we validate our algorithm’s effectiveness and practical applicability, providing decision-makers with valuable insights and tools for navigating the multifaceted landscape of risk-averse environments with inherent switching costs.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"321 1","pages":"Pages 160-176"},"PeriodicalIF":6.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221724007264","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper considers the multiarmed bandit (MAB) problem augmented with a critical real-world consideration: the cost implications of switching decisions. Our work distinguishes itself by addressing the largely unexplored domain of risk-averse MAB problems compounded by switching penalties. Such scenarios are not just theoretical constructs but are reflective of numerous practical applications. Our contribution is threefold: firstly, we explore how switching costs and risk aversion influence decision-making in MAB problems. Secondly, we present novel theoretical results, including the development of the Risk-Averse Switching Index (RASI), which addresses the dual challenges of risk aversion and switching costs, demonstrating its near-optimal efficacy. This heuristic solution method is grounded in dynamic coherent risk measures, enabling a time-consistent evaluation of risk and reward. Lastly, through rigorous numerical experiments, we validate our algorithm’s effectiveness and practical applicability, providing decision-makers with valuable insights and tools for navigating the multifaceted landscape of risk-averse environments with inherent switching costs.
本文探讨的多臂强盗(MAB)问题增加了一个重要的现实考虑因素:转换决策的成本影响。我们的工作与众不同之处在于,它涉及到了因切换惩罚而变得更加复杂的风险规避 MAB 问题,而这一领域在很大程度上尚未被探索。这种情况不仅仅是理论上的构造,而且反映了大量的实际应用。我们的贡献有三个方面:首先,我们探讨了转换成本和风险规避如何影响 MAB 问题的决策。其次,我们提出了新颖的理论成果,包括开发了风险规避转换指数(RASI),它解决了风险规避和转换成本的双重挑战,并证明了其接近最优的功效。这种启发式求解方法以动态连贯风险度量为基础,能够对风险和回报进行时间一致的评估。最后,通过严格的数值实验,我们验证了算法的有效性和实际应用性,为决策者提供了宝贵的见解和工具,帮助他们在具有固有转换成本的风险规避环境中游刃有余。
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.