Proceedings of the Third ACM International Conference on AI in Finance最新文献_第4页

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs CaPE:金融图相似性搜索的类别保留嵌入

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-26 DOI: 10.1145/3533271.3561788

Gaurav Oberoi, P. Poduval, Karamjit Singh, Sangam Verma, Pranay Gupta

Similarity-search is an important problem to solve for the payment industry having user-merchant interaction data. It finds out merchants similar to a given merchant and solves various tasks like peer-set generation, recommendation, community detection, and anomaly detection. Recent works have shown that by leveraging interaction data, Graph Neural Networks (GNNs) can be used to generate node embeddings for entities like a merchant, which can be further used for such similarity-search tasks. However, most of the real-world financial data come with high cardinality categorical features such as city, industry, super-industries, etc. which are fed to the GNNs in a one-hot encoded manner. Current GNN algorithms are not designed to work for such sparse features which makes it difficult for them to learn these sparse features preserving embeddings. In this work, we propose CaPE, a Category Preserving Embedding generation method which preserves the high cardinality feature information in the embeddings. We have designed CaPE to preserve other important numerical feature information as well. We compare CaPE with the latest GNN algorithms for embedding generation methods to showcase its superiority in peer set generation tasks on real-world datasets, both external as well as internal (synthetically generated). We also compared our method for a downstream task like link prediction.

对于拥有用户与商家交互数据的支付行业来说，相似性搜索是一个需要解决的重要问题。它找出与给定商家相似的商家，并解决各种任务，如对等集生成、推荐、社区检测和异常检测。最近的研究表明，通过利用交互数据，图神经网络(gnn)可以用来为像商人这样的实体生成节点嵌入，这可以进一步用于类似的相似性搜索任务。然而，大多数现实世界的金融数据都具有高基数的分类特征，如城市、工业、超级工业等，这些特征以一种编码方式馈送给gnn。目前的GNN算法并不是为这种稀疏特征设计的，这使得它们很难学习这些保持嵌入的稀疏特征。在这项工作中，我们提出了一种保持类别的嵌入生成方法CaPE，它保留了嵌入中的高基数特征信息。我们还设计了CaPE来保存其他重要的数值特征信息。我们将CaPE与嵌入生成方法的最新GNN算法进行比较，以展示其在真实世界数据集上的对等集生成任务中的优势，包括外部和内部(综合生成)。我们还比较了我们的方法用于下游任务，如链接预测。

{"title":"CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs","authors":"Gaurav Oberoi, P. Poduval, Karamjit Singh, Sangam Verma, Pranay Gupta","doi":"10.1145/3533271.3561788","DOIUrl":"https://doi.org/10.1145/3533271.3561788","url":null,"abstract":"Similarity-search is an important problem to solve for the payment industry having user-merchant interaction data. It finds out merchants similar to a given merchant and solves various tasks like peer-set generation, recommendation, community detection, and anomaly detection. Recent works have shown that by leveraging interaction data, Graph Neural Networks (GNNs) can be used to generate node embeddings for entities like a merchant, which can be further used for such similarity-search tasks. However, most of the real-world financial data come with high cardinality categorical features such as city, industry, super-industries, etc. which are fed to the GNNs in a one-hot encoded manner. Current GNN algorithms are not designed to work for such sparse features which makes it difficult for them to learn these sparse features preserving embeddings. In this work, we propose CaPE, a Category Preserving Embedding generation method which preserves the high cardinality feature information in the embeddings. We have designed CaPE to preserve other important numerical feature information as well. We compare CaPE with the latest GNN algorithms for embedding generation methods to showcase its superiority in peer set generation tasks on real-world datasets, both external as well as internal (synthetically generated). We also compared our method for a downstream task like link prediction.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Machine Learning for Earnings Prediction: A Nonlinear Tensor Approach for Data Integration and Completion 收益预测的机器学习:数据集成和补全的非线性张量方法

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-26 DOI: 10.1145/3533271.3561677

Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu

Successful predictive models for financial applications often require harnessing complementary information from multiple datasets. Incorporating data from different sources into a single model can be challenging as they vary in structure, dimensions, quality, and completeness. Simply merging those datasets can cause redundancy, discrepancy, and information loss. This paper proposes a convolutional neural network-based nonlinear tensor coupling and completion framework (NLTCC) to combine heterogeneous datasets without compromising data quality. We demonstrate the effectiveness of NLTCC in solving a specific business problem - predicting firms’ earnings from financial analysts’ earnings forecast. First, we apply NLTCC to fuse firm characteristics and stock market information into the financial analysts’ earnings forecasts data to impute missing values and improve data quality. Subsequently, we predict the next quarter’s earnings based on the imputed data. The experiments reveal that the prediction error decreases by 65% compared with the benchmark analysts’ consensus forecast. The long-short portfolio returns based on NLTCC outperform analysts’ consensus forecast and the S&P-500 index from three-day up to two-month holding period. The prediction accuracy improvement is robust with different performance metrics and various industry sectors. Notably, it is more salient for the sectors with higher heterogeneity.

成功的金融应用预测模型通常需要利用来自多个数据集的互补信息。将来自不同来源的数据合并到单个模型中可能具有挑战性，因为它们在结构、维度、质量和完整性方面各不相同。简单地合并这些数据集可能会导致冗余、差异和信息丢失。本文提出了一种基于卷积神经网络的非线性张量耦合和补全框架(NLTCC)，在不影响数据质量的情况下组合异构数据集。我们证明了NLTCC在解决一个特定的商业问题——从财务分析师的收益预测中预测公司收益方面的有效性。首先，我们运用NLTCC将企业特征和股票市场信息融合到金融分析师的收益预测数据中，以估算缺失值并提高数据质量。随后，我们根据估算的数据预测下一季度的收益。实验表明，与基准分析师的共识预测相比，预测误差降低了65%。基于NLTCC的多空组合回报优于分析师的共识预测和标准普尔500指数，从三天到两个月的持有期不等。对于不同的性能指标和不同的行业部门，预测精度的提高具有鲁棒性。值得注意的是，对于异质性较高的行业，这一点更为突出。

{"title":"Machine Learning for Earnings Prediction: A Nonlinear Tensor Approach for Data Integration and Completion","authors":"Ajim Uddin, Xinyuan Tao, Chia-Ching Chou, Dantong Yu","doi":"10.1145/3533271.3561677","DOIUrl":"https://doi.org/10.1145/3533271.3561677","url":null,"abstract":"Successful predictive models for financial applications often require harnessing complementary information from multiple datasets. Incorporating data from different sources into a single model can be challenging as they vary in structure, dimensions, quality, and completeness. Simply merging those datasets can cause redundancy, discrepancy, and information loss. This paper proposes a convolutional neural network-based nonlinear tensor coupling and completion framework (NLTCC) to combine heterogeneous datasets without compromising data quality. We demonstrate the effectiveness of NLTCC in solving a specific business problem - predicting firms’ earnings from financial analysts’ earnings forecast. First, we apply NLTCC to fuse firm characteristics and stock market information into the financial analysts’ earnings forecasts data to impute missing values and improve data quality. Subsequently, we predict the next quarter’s earnings based on the imputed data. The experiments reveal that the prediction error decreases by 65% compared with the benchmark analysts’ consensus forecast. The long-short portfolio returns based on NLTCC outperform analysts’ consensus forecast and the S&P-500 index from three-day up to two-month holding period. The prediction accuracy improvement is robust with different performance metrics and various industry sectors. Notably, it is more salient for the sectors with higher heterogeneity.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123434035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning Not to Spoof 学会不欺骗

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-26 DOI: 10.1145/3533271.3561767

David Byrd

As intelligent trading agents based on reinforcement learning (RL) gain prevalence, it becomes more important to ensure that RL agents obey laws, regulations, and human behavioral expectations. There is substantial literature concerning the aversion of obvious catastrophes like crashing a helicopter or bankrupting a trading account, but little around the avoidance of subtle non-normative behavior for which there are examples, but no programmable definition. Such behavior may violate legal or regulatory, rather than physical or monetary, constraints. In this article, I consider a series of experiments in which an intelligent stock trading agent maximizes profit but may also inadvertently learn to spoof the market in which it participates. I first inject a hand-coded spoofing agent to a multi-agent market simulation and learn to recognize spoofing activity sequences. Then I replace the hand-coded spoofing trader with a simple profit-maximizing RL agent and observe that it independently discovers spoofing as the optimal strategy. Finally, I introduce a method to incorporate the recognizer as normative guide, shaping the agent’s perceived rewards and altering its selected actions. The agent remains profitable while avoiding spoofing behaviors that would result in even higher profit. After presenting the empirical results, I conclude with some recommendations. The method should generalize to the reduction of any unwanted behavior for which a recognizer can be learned.

随着基于强化学习(RL)的智能交易代理的普及，确保RL代理遵守法律、法规和人类行为期望变得更加重要。关于对明显的灾难(如直升机坠毁或交易账户破产)的厌恶，有大量的文献，但很少有关于避免微妙的非规范行为的文献，这些行为有例子，但没有可编程的定义。这种行为可能违反法律或法规，而不是物理或货币限制。在这篇文章中，我考虑了一系列的实验，在这些实验中，一个聪明的股票交易代理将利润最大化，但也可能无意中学会了欺骗它所参与的市场。我首先将一个手工编码的欺骗代理注入到一个多代理市场模拟中，并学习识别欺骗活动序列。然后，我用一个简单的利润最大化RL代理替换了手工编码的欺骗交易者，并观察到它独立地发现欺骗是最优策略。最后，我介绍了一种方法，将识别器作为规范指南，塑造代理的感知奖励并改变其选择的行为。代理保持盈利，同时避免欺骗行为，这将导致更高的利润。在给出实证结果之后，我最后提出了一些建议。该方法应该推广到减少任何不需要的行为，识别器可以从中学习。

{"title":"Learning Not to Spoof","authors":"David Byrd","doi":"10.1145/3533271.3561767","DOIUrl":"https://doi.org/10.1145/3533271.3561767","url":null,"abstract":"As intelligent trading agents based on reinforcement learning (RL) gain prevalence, it becomes more important to ensure that RL agents obey laws, regulations, and human behavioral expectations. There is substantial literature concerning the aversion of obvious catastrophes like crashing a helicopter or bankrupting a trading account, but little around the avoidance of subtle non-normative behavior for which there are examples, but no programmable definition. Such behavior may violate legal or regulatory, rather than physical or monetary, constraints. In this article, I consider a series of experiments in which an intelligent stock trading agent maximizes profit but may also inadvertently learn to spoof the market in which it participates. I first inject a hand-coded spoofing agent to a multi-agent market simulation and learn to recognize spoofing activity sequences. Then I replace the hand-coded spoofing trader with a simple profit-maximizing RL agent and observe that it independently discovers spoofing as the optimal strategy. Finally, I introduce a method to incorporate the recognizer as normative guide, shaping the agent’s perceived rewards and altering its selected actions. The agent remains profitable while avoiding spoofing behaviors that would result in even higher profit. After presenting the empirical results, I conclude with some recommendations. The method should generalize to the reduction of any unwanted behavior for which a recognizer can be learned.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131742458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mapping of Financial Services datasets using Human-in-the-Loop 使用Human-in-the-Loop映射金融服务数据集

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-26 DOI: 10.1145/3533271.3561705

Shubhi Asthana, R. Mahindru

Increasing access to financial services data helps accelerate the monitoring and management of datasets and facilitates better business decision-making. However, financial services datasets are typically vast, ranging in terabytes of data, containing both structured and unstructured. It is a laborious task to comb through all the data and map them reasonably. Mapping the data is important to perform comprehensive analysis and take informed business decisions. Based on client engagements, we have observed that there is a lack of industry standards for definitions of key terms and a lack of governance for maintaining business processes. This typically leads to disconnected siloed datasets generated from disintegrated systems. To address these challenges, we developed a novel methodology DaME (Data Mapping Engine) that performs data mapping by training a data mapping engine and utilizing human-in-the-loop techniques. The results from the industrial application and evaluation of DaME on a financial services dataset are encouraging that it can help reduce manual effort by automating data mapping and reusing the learning. The accuracy from our dataset in the application is much higher at 69% compared to the existing state-of-the-art with an accuracy of 34%. It has also helped improve the productivity of the industry practitioners, by saving them 14,000 hours of time spent manually mapping vast data stores over a period of ten months.

增加对金融服务数据的访问有助于加快对数据集的监测和管理，并促进更好的业务决策。然而，金融服务数据集通常是巨大的，以tb为单位的数据，包含结构化和非结构化数据。梳理所有的数据并合理地绘制它们是一项艰巨的任务。映射数据对于执行全面分析和做出明智的业务决策非常重要。根据客户约定，我们观察到缺少关键术语定义的行业标准，也缺少维护业务流程的治理。这通常会导致从分解的系统生成的不连贯的孤立数据集。为了应对这些挑战，我们开发了一种新的方法DaME(数据映射引擎)，通过训练数据映射引擎和利用人在循环技术来执行数据映射。金融服务数据集上DaME的工业应用和评估结果令人鼓舞，它可以通过自动化数据映射和重用学习来帮助减少人工工作。与现有的34%的准确率相比，我们在应用程序中数据集的准确率要高得多，达到69%。它还帮助提高了行业从业者的生产力，在10个月的时间里，为他们节省了14000小时的手动映射大量数据存储的时间。

{"title":"Mapping of Financial Services datasets using Human-in-the-Loop","authors":"Shubhi Asthana, R. Mahindru","doi":"10.1145/3533271.3561705","DOIUrl":"https://doi.org/10.1145/3533271.3561705","url":null,"abstract":"Increasing access to financial services data helps accelerate the monitoring and management of datasets and facilitates better business decision-making. However, financial services datasets are typically vast, ranging in terabytes of data, containing both structured and unstructured. It is a laborious task to comb through all the data and map them reasonably. Mapping the data is important to perform comprehensive analysis and take informed business decisions. Based on client engagements, we have observed that there is a lack of industry standards for definitions of key terms and a lack of governance for maintaining business processes. This typically leads to disconnected siloed datasets generated from disintegrated systems. To address these challenges, we developed a novel methodology DaME (Data Mapping Engine) that performs data mapping by training a data mapping engine and utilizing human-in-the-loop techniques. The results from the industrial application and evaluation of DaME on a financial services dataset are encouraging that it can help reduce manual effort by automating data mapping and reusing the learning. The accuracy from our dataset in the application is much higher at 69% compared to the existing state-of-the-art with an accuracy of 34%. It has also helped improve the productivity of the industry practitioners, by saving them 14,000 hours of time spent manually mapping vast data stores over a period of ten months.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132259584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Can maker-taker fees prevent algorithmic cooperation in market making? 做市商收取费用能否阻止做市中的算法合作?

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-26 DOI: 10.1145/3533271.3561685

Bingyan Han

In a semi-realistic market simulator, independent reinforcement learning algorithms may facilitate market makers to maintain wide spreads even without communication. This unexpected outcome challenges the current antitrust law framework. We study the effectiveness of maker-taker fee models in preventing cooperation via algorithms. After modeling market making as a repeated general-sum game, we experimentally show that the relation between net transaction costs and maker rebates is not necessarily monotone. Besides an upper bound on taker fees, we may also need a lower bound on maker rebates to destabilize the cooperation. We also consider the taker-maker model and the effects of mid-price volatility, inventory risk, and the number of agents.

在半现实市场模拟器中，独立的强化学习算法可以帮助做市商在没有沟通的情况下保持广泛的价差。这一意想不到的结果挑战了现行的反垄断法框架。我们通过算法研究了制造者-接受者收费模型在防止合作方面的有效性。在将做市建模为一个重复的一般和博弈之后，我们通过实验证明净交易成本和制造商回扣之间的关系并不一定是单调的。除了接受者费用的上界，我们可能还需要一个制造商回扣的下界来破坏合作的稳定性。我们还考虑了承担者-制造商模型以及中间价格波动、库存风险和代理商数量的影响。

引用次数: 0

LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering LaundroGraph:反洗钱的自监督图表示学习

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-10-25 DOI: 10.1145/3533271.3561727

Mário Cardoso, Pedro Saleiro, P. Bizarro

Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these systems have very high false positive rates (estimated to be over 95%). The scarcity of labels hinders the use of alternative systems based on supervised learning, reducing their applicability in real-world applications. In this work we present LaundroGraph, a novel self-supervised graph representation learning approach to encode banking customers and financial transactions into meaningful representations. These representations are used to provide insights to assist the AML reviewing process, such as identifying anomalous movements for a given customer. LaundroGraph represents the underlying network of financial interactions as a customer-transaction bipartite graph and trains a graph neural network on a fully self-supervised link prediction task. We empirically demonstrate that our approach outperforms other strong baselines on self-supervised link prediction using a real-world dataset, improving the best non-graph baseline by 12 p.p. of AUC. The goal is to increase the efficiency of the reviewing process by supplying these AI-powered insights to the analysts upon review. To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.

反洗钱(AML)法规要求金融机构根据一套规则部署反洗钱系统，这些规则在触发时形成可疑警报的基础，供人类分析师评估。审查这些案例是一项繁琐而复杂的任务，需要分析师在庞大的金融互动网络中导航，以验证可疑的活动。此外，这些系统的假阳性率非常高(估计超过95%)。标签的稀缺性阻碍了基于监督学习的替代系统的使用，降低了它们在现实应用中的适用性。在这项工作中，我们提出了一种新的自监督图表示学习方法LaundroGraph，将银行客户和金融交易编码为有意义的表示。这些表示用于提供见解，以协助AML审查流程，例如识别给定客户的异常活动。LaundroGraph将金融交互的底层网络表示为客户-交易二部图，并在一个完全自监督的链接预测任务上训练一个图神经网络。我们通过经验证明，我们的方法在使用真实数据集的自监督链接预测上优于其他强基线，将最佳非图基线的AUC提高了12个百分点。目标是通过在审查时向分析师提供这些ai驱动的见解来提高审查过程的效率。据我们所知，这是在AML检测背景下的第一个完全自我监督的系统。

{"title":"LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering","authors":"Mário Cardoso, Pedro Saleiro, P. Bizarro","doi":"10.1145/3533271.3561727","DOIUrl":"https://doi.org/10.1145/3533271.3561727","url":null,"abstract":"Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these systems have very high false positive rates (estimated to be over 95%). The scarcity of labels hinders the use of alternative systems based on supervised learning, reducing their applicability in real-world applications. In this work we present LaundroGraph, a novel self-supervised graph representation learning approach to encode banking customers and financial transactions into meaningful representations. These representations are used to provide insights to assist the AML reviewing process, such as identifying anomalous movements for a given customer. LaundroGraph represents the underlying network of financial interactions as a customer-transaction bipartite graph and trains a graph neural network on a fully self-supervised link prediction task. We empirically demonstrate that our approach outperforms other strong baselines on self-supervised link prediction using a real-world dataset, improving the best non-graph baseline by 12 p.p. of AUC. The goal is to increase the efficiency of the reviewing process by supplying these AI-powered insights to the analysts upon review. To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134250312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning to simulate realistic limit order book markets from data as a World Agent 学习从世界代理的数据中模拟现实的限价订单市场

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-09-26 DOI: 10.1145/3533271.3561753

Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, T. Balch

Multi-agent market simulators usually require careful calibration to emulate real markets, which includes the number and the type of agents. Poorly calibrated simulators can lead to misleading conclusions, potentially causing severe loss when employed by investment banks, hedge funds, and traders to study and evaluate trading strategies. In this paper, we propose a world model simulator that accurately emulates a limit order book market – it requires no agent calibration but rather learns the simulated market behavior directly from historical data. Traditional approaches fail short to learn and calibrate trader population, as historical labeled data with details on each individual trader strategy is not publicly available. Our approach proposes to learn a unique "world" agent from historical data. It is intended to emulate the overall trader population, without the need of making assumptions about individual market agent strategies. We implement our world agent simulator models as a Conditional Generative Adversarial Network (CGAN), as well as a mixture of parametric distributions, and we compare our models against previous work. Qualitatively and quantitatively, we show that the proposed approaches consistently outperform previous work, providing more realism and responsiveness.

多智能体市场模拟器通常需要仔细校准以模拟真实市场，其中包括智能体的数量和类型。校准不当的模拟器可能会导致误导性的结论，当被投资银行、对冲基金和交易员用来研究和评估交易策略时，可能会造成严重的损失。在本文中，我们提出了一个世界模型模拟器，准确地模拟了一个限价订单市场，它不需要智能体校准，而是直接从历史数据中学习模拟的市场行为。传统的方法无法学习和校准交易者群体，因为每个交易者策略细节的历史标记数据是不公开的。我们的方法建议从历史数据中学习一个独特的“世界”代理。它旨在模拟整个交易者群体，而不需要对单个市场代理策略做出假设。我们将我们的世界代理模拟器模型实现为条件生成对抗网络(CGAN)，以及参数分布的混合，并将我们的模型与之前的工作进行比较。在定性和定量上，我们表明所提出的方法始终优于以前的工作，提供了更多的现实性和响应性。

{"title":"Learning to simulate realistic limit order book markets from data as a World Agent","authors":"Andrea Coletta, Aymeric Moulin, Svitlana Vyetrenko, T. Balch","doi":"10.1145/3533271.3561753","DOIUrl":"https://doi.org/10.1145/3533271.3561753","url":null,"abstract":"Multi-agent market simulators usually require careful calibration to emulate real markets, which includes the number and the type of agents. Poorly calibrated simulators can lead to misleading conclusions, potentially causing severe loss when employed by investment banks, hedge funds, and traders to study and evaluate trading strategies. In this paper, we propose a world model simulator that accurately emulates a limit order book market – it requires no agent calibration but rather learns the simulated market behavior directly from historical data. Traditional approaches fail short to learn and calibrate trader population, as historical labeled data with details on each individual trader strategy is not publicly available. Our approach proposes to learn a unique \"world\" agent from historical data. It is intended to emulate the overall trader population, without the need of making assumptions about individual market agent strategies. We implement our world agent simulator models as a Conditional Generative Adversarial Network (CGAN), as well as a mixture of parametric distributions, and we compare our models against previous work. Qualitatively and quantitatively, we show that the proposed approaches consistently outperform previous work, providing more realism and responsiveness.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

StyleTime: Style Transfer for Synthetic Time Series Generation StyleTime:合成时间序列生成的样式转移

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-09-22 DOI: 10.1145/3533271.3561772

Yousef El-Laham, Svitlana Vyetrenko

Neural style transfer is a powerful computer vision technique that can incorporate the artistic “style" of one image to the “content" of another. The underlying theory behind the approach relies on the assumption that the style of an image is represented by the Gram matrix of its features, which is typically extracted from pre-trained convolutional neural networks (e.g., VGG-19). This idea does not straightforwardly extend to time series stylization since notions of style for two-dimensional images are not analogous to notions of style for one-dimensional time series. In this work, a novel formulation of time series style transfer is proposed for the purpose of synthetic data generation and enhancement. We introduce the concept of stylized features for time series, which is directly related to the time series realism properties, and propose a novel stylization algorithm, called StyleTime, that uses explicit feature extraction techniques to combine the underlying content (trend) of one time series with the style (distributional properties) of another. Further, we discuss evaluation metrics, and compare our work to existing state-of-the-art time series generation and augmentation schemes. To validate the effectiveness of our methods, we use stylized synthetic data as a means for data augmentation to improve the performance of recurrent neural network models on several forecasting tasks.

神经风格转移是一种强大的计算机视觉技术，可以将一幅图像的艺术“风格”与另一幅图像的“内容”结合起来。该方法背后的基本理论依赖于一个假设，即图像的风格由其特征的Gram矩阵表示，该矩阵通常是从预训练的卷积神经网络(例如VGG-19)中提取的。这个想法不能直接扩展到时间序列的样式化，因为二维图像的样式概念与一维时间序列的样式概念不同。在这项工作中，为了合成数据的生成和增强，提出了一种新的时间序列类型转移公式。我们引入了与时间序列真实感属性直接相关的时间序列风格化特征的概念，并提出了一种新的风格化算法，称为StyleTime，它使用显式特征提取技术将一个时间序列的底层内容(趋势)与另一个时间序列的风格(分布属性)结合起来。此外，我们讨论了评估指标，并将我们的工作与现有的最先进的时间序列生成和增强方案进行了比较。为了验证我们方法的有效性，我们使用风格化的合成数据作为数据增强的手段，以提高递归神经网络模型在几个预测任务上的性能。

{"title":"StyleTime: Style Transfer for Synthetic Time Series Generation","authors":"Yousef El-Laham, Svitlana Vyetrenko","doi":"10.1145/3533271.3561772","DOIUrl":"https://doi.org/10.1145/3533271.3561772","url":null,"abstract":"Neural style transfer is a powerful computer vision technique that can incorporate the artistic “style\" of one image to the “content\" of another. The underlying theory behind the approach relies on the assumption that the style of an image is represented by the Gram matrix of its features, which is typically extracted from pre-trained convolutional neural networks (e.g., VGG-19). This idea does not straightforwardly extend to time series stylization since notions of style for two-dimensional images are not analogous to notions of style for one-dimensional time series. In this work, a novel formulation of time series style transfer is proposed for the purpose of synthetic data generation and enhancement. We introduce the concept of stylized features for time series, which is directly related to the time series realism properties, and propose a novel stylization algorithm, called StyleTime, that uses explicit feature extraction techniques to combine the underlying content (trend) of one time series with the style (distributional properties) of another. Further, we discuss evaluation metrics, and compare our work to existing state-of-the-art time series generation and augmentation schemes. To validate the effectiveness of our methods, we use stylized synthetic data as a means for data augmentation to improve the performance of recurrent neural network models on several forecasting tasks.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116462264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Optimal Stopping with Gaussian Processes 高斯过程的最优停止

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-09-22 DOI: 10.1145/3533271.3561670

Kshama Dwarakanath, Danial Dervovic, P. Tavallali, Svitlana Vyetrenko, T. Balch

We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific applications to financial markets. We show that structural properties commonly exhibited by financial time series (e.g., the tendency to mean-revert) allow the use of Gaussian and Deep Gaussian Process models that further enable us to analytically evaluate optimal stopping value functions and policies. We additionally quantify uncertainty in the value function by propagating the price model through the optimal stopping analysis. We compare and contrast our proposed methods against a sampling-based method, as well as a deep learning based benchmark that is currently considered the state-of-the-art in the literature. We show that our family of algorithms outperforms benchmarks on three historical time series datasets that include intra-day and end-of-day equity asset prices as well as the daily US treasury yield curve rates.

我们提出了一组新的基于高斯过程的时间序列快速近似最优停止算法，并具体应用于金融市场。我们表明，金融时间序列通常表现出的结构特性(例如，均值回归的趋势)允许使用高斯和深度高斯过程模型，这些模型进一步使我们能够分析评估最佳停止值函数和策略。我们还通过最优停止分析传播价格模型来量化价值函数中的不确定性。我们将我们提出的方法与基于抽样的方法以及目前被认为是文献中最先进的基于深度学习的基准进行比较和对比。我们表明，我们的算法家族在三个历史时间序列数据集上的表现优于基准，这些数据集包括日内和日内股票资产价格以及每日美国国债收益率曲线利率。

引用次数: 1

Equitable Marketplace Mechanism Design 公平市场机制设计

Proceedings of the Third ACM International Conference on AI in Finance

Pub Date : 2022-09-22 DOI: 10.1145/3533271.3561673

Kshama Dwarakanath, Svitlana Vyetrenko, T. Balch

We consider a trading marketplace that is populated by traders with diverse trading strategies and objectives. The marketplace allows the suppliers to list their goods and facilitates matching between buyers and sellers. In return, such a marketplace typically charges fees for facilitating trade. The goal of this work is to design a dynamic fee schedule for the marketplace that is equitable and profitable to all traders while being profitable to the marketplace at the same time (from charging fees). Since the traders adapt their strategies to the fee schedule, we present a reinforcement learning framework for simultaneously learning a marketplace fee schedule and trading strategies that adapt to this fee schedule using a weighted optimization objective of profits and equitability. We illustrate the use of the proposed approach in detail on a simulated stock exchange with different types of investors, specifically market makers and consumer investors. As we vary the equitability weights across different investor classes, we see that the learnt exchange fee schedule starts favoring the class of investors with the highest weight. We further discuss the observed insights from the simulated stock exchange in light of the general framework of equitable marketplace mechanism design.

我们认为交易市场是由具有不同交易策略和目标的交易者组成的。市场允许供应商列出他们的商品，并促进买家和卖家之间的匹配。作为回报，这样的市场通常会收取促进交易的费用。这项工作的目标是为市场设计一个动态的收费时间表，该时间表对所有交易者都是公平和有利可图的，同时对市场也是有利可图的(从收费中)。由于交易者调整他们的策略以适应收费时间表，我们提出了一个强化学习框架，用于同时学习市场收费时间表和使用利润和公平性加权优化目标适应该收费时间表的交易策略。我们详细说明了在不同类型投资者(特别是做市商和消费者投资者)的模拟证券交易所中使用所提出的方法。当我们改变不同投资者类别的股权权重时，我们看到，学习到的交易费用时间表开始倾向于权重最高的投资者类别。在公平市场机制设计的一般框架下，我们进一步讨论了模拟证券交易所的观察结果。

{"title":"Equitable Marketplace Mechanism Design","authors":"Kshama Dwarakanath, Svitlana Vyetrenko, T. Balch","doi":"10.1145/3533271.3561673","DOIUrl":"https://doi.org/10.1145/3533271.3561673","url":null,"abstract":"We consider a trading marketplace that is populated by traders with diverse trading strategies and objectives. The marketplace allows the suppliers to list their goods and facilitates matching between buyers and sellers. In return, such a marketplace typically charges fees for facilitating trade. The goal of this work is to design a dynamic fee schedule for the marketplace that is equitable and profitable to all traders while being profitable to the marketplace at the same time (from charging fees). Since the traders adapt their strategies to the fee schedule, we present a reinforcement learning framework for simultaneously learning a marketplace fee schedule and trading strategies that adapt to this fee schedule using a weighted optimization objective of profits and equitability. We illustrate the use of the proposed approach in detail on a simulated stock exchange with different types of investors, specifically market makers and consumer investors. As we vary the equitability weights across different investor classes, we see that the learnt exchange fee schedule starts favoring the class of investors with the highest weight. We further discuss the observed insights from the simulated stock exchange in light of the general framework of equitable marketplace mechanism design.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128599022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1