{"title":"Risk-Aware Linear Bandits with Application in Smart Order Routing","authors":"Jingwei Ji, Renyuan Xu, Ruihao Zhu","doi":"10.1145/3533271.3561692","DOIUrl":null,"url":null,"abstract":"Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose reward can be expressed as linear functions of (initially) unknown parameters. We first propose the Risk-Aware Explore-then-Commit (RISE) algorithm driven by the variance-minimizing G-optimal design. Then, we rigorously analyze its regret upper bound to show that, by leveraging the linear structure, the algorithm can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the RISE algorithm by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that the RISE algorithm can outperform the competing methods, especially when the decision-making scenario becomes more complex.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose reward can be expressed as linear functions of (initially) unknown parameters. We first propose the Risk-Aware Explore-then-Commit (RISE) algorithm driven by the variance-minimizing G-optimal design. Then, we rigorously analyze its regret upper bound to show that, by leveraging the linear structure, the algorithm can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the RISE algorithm by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that the RISE algorithm can outperform the competing methods, especially when the decision-making scenario becomes more complex.