Learning Not to Spoof

Proceedings of the Third ACM International Conference on AI in Finance Pub Date : 2022-10-26 DOI:10.1145/3533271.3561767

David Byrd

{"title":"Learning Not to Spoof","authors":"David Byrd","doi":"10.1145/3533271.3561767","DOIUrl":null,"url":null,"abstract":"As intelligent trading agents based on reinforcement learning (RL) gain prevalence, it becomes more important to ensure that RL agents obey laws, regulations, and human behavioral expectations. There is substantial literature concerning the aversion of obvious catastrophes like crashing a helicopter or bankrupting a trading account, but little around the avoidance of subtle non-normative behavior for which there are examples, but no programmable definition. Such behavior may violate legal or regulatory, rather than physical or monetary, constraints. In this article, I consider a series of experiments in which an intelligent stock trading agent maximizes profit but may also inadvertently learn to spoof the market in which it participates. I first inject a hand-coded spoofing agent to a multi-agent market simulation and learn to recognize spoofing activity sequences. Then I replace the hand-coded spoofing trader with a simple profit-maximizing RL agent and observe that it independently discovers spoofing as the optimal strategy. Finally, I introduce a method to incorporate the recognizer as normative guide, shaping the agent’s perceived rewards and altering its selected actions. The agent remains profitable while avoiding spoofing behaviors that would result in even higher profit. After presenting the empirical results, I conclude with some recommendations. The method should generalize to the reduction of any unwanted behavior for which a recognizer can be learned.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533271.3561767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As intelligent trading agents based on reinforcement learning (RL) gain prevalence, it becomes more important to ensure that RL agents obey laws, regulations, and human behavioral expectations. There is substantial literature concerning the aversion of obvious catastrophes like crashing a helicopter or bankrupting a trading account, but little around the avoidance of subtle non-normative behavior for which there are examples, but no programmable definition. Such behavior may violate legal or regulatory, rather than physical or monetary, constraints. In this article, I consider a series of experiments in which an intelligent stock trading agent maximizes profit but may also inadvertently learn to spoof the market in which it participates. I first inject a hand-coded spoofing agent to a multi-agent market simulation and learn to recognize spoofing activity sequences. Then I replace the hand-coded spoofing trader with a simple profit-maximizing RL agent and observe that it independently discovers spoofing as the optimal strategy. Finally, I introduce a method to incorporate the recognizer as normative guide, shaping the agent’s perceived rewards and altering its selected actions. The agent remains profitable while avoiding spoofing behaviors that would result in even higher profit. After presenting the empirical results, I conclude with some recommendations. The method should generalize to the reduction of any unwanted behavior for which a recognizer can be learned.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学会不欺骗

随着基于强化学习(RL)的智能交易代理的普及，确保RL代理遵守法律、法规和人类行为期望变得更加重要。关于对明显的灾难(如直升机坠毁或交易账户破产)的厌恶，有大量的文献，但很少有关于避免微妙的非规范行为的文献，这些行为有例子，但没有可编程的定义。这种行为可能违反法律或法规，而不是物理或货币限制。在这篇文章中，我考虑了一系列的实验，在这些实验中，一个聪明的股票交易代理将利润最大化，但也可能无意中学会了欺骗它所参与的市场。我首先将一个手工编码的欺骗代理注入到一个多代理市场模拟中，并学习识别欺骗活动序列。然后，我用一个简单的利润最大化RL代理替换了手工编码的欺骗交易者，并观察到它独立地发现欺骗是最优策略。最后，我介绍了一种方法，将识别器作为规范指南，塑造代理的感知奖励并改变其选择的行为。代理保持盈利，同时避免欺骗行为，这将导致更高的利润。在给出实证结果之后，我最后提出了一些建议。该方法应该推广到减少任何不需要的行为，识别器可以从中学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Third ACM International Conference on AI in Finance

自引率

0.00%

发文量

期刊最新文献

Core Matrix Regression and Prediction with Regularization Risk-Aware Linear Bandits with Application in Smart Order Routing Addressing Extreme Market Responses Using Secure Aggregation Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts Objective Driven Portfolio Construction Using Reinforcement Learning