Antonio Riva, L. Bisi, P. Liotet, Luca Sabbioni, Edoardo Vittori, Marco Pinciroli, Michele Trapletti, Marcello Restelli
Reinforcement learning has proven to be successful in obtaining profitable trading policies; however, the effectiveness of such strategies is strongly conditioned to market stationarity. This hypothesis is challenged by the regime switches frequently experienced by practitioners; thus, when many models are available, validation may become a difficult task. We propose to overcome the issue by explicitly modeling the trading task as a non-stationary reinforcement learning problem. Nevertheless, state-of-the-art RL algorithms for this setting usually require task distribution or dynamics to be predictable, an assumption that can hardly be true in the financial framework. In this work, we propose, instead, a method for the dynamic selection of the best RL agent which is only driven by profit performance. Our modular two-layer approach allows choosing the best strategy among a set of RL models through an online-learning algorithm. While we could select any combination of algorithms in principle, our solution employs two state-of-the-art algorithms: Fitted Q-Iteration (FQI) for the RL layer and Optimistic Adapt ML-Prod (OAMP) for the online learning one. The proposed approach is tested on two simulated FX trading tasks, using actual historical data for the AUS/USD and GBP/USD currency pairs.
{"title":"Addressing Non-Stationarity in FX Trading with Online Model Selection of Offline RL Experts","authors":"Antonio Riva, L. Bisi, P. Liotet, Luca Sabbioni, Edoardo Vittori, Marco Pinciroli, Michele Trapletti, Marcello Restelli","doi":"10.1145/3533271.3561780","DOIUrl":"https://doi.org/10.1145/3533271.3561780","url":null,"abstract":"Reinforcement learning has proven to be successful in obtaining profitable trading policies; however, the effectiveness of such strategies is strongly conditioned to market stationarity. This hypothesis is challenged by the regime switches frequently experienced by practitioners; thus, when many models are available, validation may become a difficult task. We propose to overcome the issue by explicitly modeling the trading task as a non-stationary reinforcement learning problem. Nevertheless, state-of-the-art RL algorithms for this setting usually require task distribution or dynamics to be predictable, an assumption that can hardly be true in the financial framework. In this work, we propose, instead, a method for the dynamic selection of the best RL agent which is only driven by profit performance. Our modular two-layer approach allows choosing the best strategy among a set of RL models through an online-learning algorithm. While we could select any combination of algorithms in principle, our solution employs two state-of-the-art algorithms: Fitted Q-Iteration (FQI) for the RL layer and Optimistic Adapt ML-Prod (OAMP) for the online learning one. The proposed approach is tested on two simulated FX trading tasks, using actual historical data for the AUS/USD and GBP/USD currency pairs.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116014066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancement in reinforcement learning has enabled robust data-driven direct optimization on the investor’s objectives without estimating the stock movements as in the traditional two-step approach [8]. Given diverse investment styles, a single trading strategy cannot serve different investor objectives. We propose an objective function formulation to augment the direct optimization approach in AlphaPortfolio (Cong et al. [6]). In addition to simple baseline Sharpe ratio used in AlphaPortfolio, we add three investor’s objectives for (i) achieving excess alpha by maximizing the information ratio; (ii) mitigating downside risks through optimizing maximum drawdown-adjusted return; and (iii) reducing transaction costs via restricting the turnover rate. We also introduce four new features: momentum, short-term reversal, drawdown, and maximum drawdown to the framework. Our objective function formulation allows for controlling the trade-off between both maximum drawdown and turnover with respect to realized return, creating flexible trading strategies for various risk appetites. The maximum drawdown efficient frontier curve, derived using a range of values of hyper-parameter α, reflects the similar concave relationship as observed in the theoretical study by Chekhlov et al. [5]. To improve the interpretability of the deep neural network and drive insights into traditional factor investment, we further explore the drivers that contribute to the top and bottom performing firms by running regression analysis using Random Forest, which achieves R2 of approximately 0.8 in producing the same winner scores as our model. Finally, to uncover the balance between profits and diversification, we investigate the impact of the trading size on strategy behaviors.
强化学习的最新进展已经实现了对投资者目标的稳健数据驱动的直接优化,而无需像传统的两步方法那样估计股票走势。考虑到多样化的投资风格,单一的交易策略无法满足不同投资者的目标。我们提出了一个目标函数公式来增强AlphaPortfolio中的直接优化方法(Cong et al.[6])。除了在AlphaPortfolio中使用的简单基准夏普比率之外,我们还增加了三个投资者的目标:(i)通过最大化信息比率来实现超额阿尔法;(ii)通过优化最大回调收益来降低下行风险;(三)通过限制换手率降低交易成本。我们还向框架引入了四个新特性:动量、短期反转、回调和最大回调。我们的目标函数公式允许在实现回报方面控制最大回撤和营业额之间的权衡,为各种风险偏好创建灵活的交易策略。利用超参数α值范围推导出的最大降压有效边界曲线,反映了Chekhlov等人在理论研究中观察到的类似凹关系。为了提高深度神经网络的可解释性并推动对传统要素投资的见解,我们通过使用随机森林(Random Forest)进行回归分析,进一步探索了对表现最好和最差的公司做出贡献的驱动因素,在产生与我们的模型相同的赢家得分时,其R2约为0.8。最后,为了揭示利润与多元化之间的平衡,我们研究了交易规模对策略行为的影响。
{"title":"Objective Driven Portfolio Construction Using Reinforcement Learning","authors":"Tina Wang, Jithin Pradeep, Jerry Zikun Chen","doi":"10.1145/3533271.3561764","DOIUrl":"https://doi.org/10.1145/3533271.3561764","url":null,"abstract":"Recent advancement in reinforcement learning has enabled robust data-driven direct optimization on the investor’s objectives without estimating the stock movements as in the traditional two-step approach [8]. Given diverse investment styles, a single trading strategy cannot serve different investor objectives. We propose an objective function formulation to augment the direct optimization approach in AlphaPortfolio (Cong et al. [6]). In addition to simple baseline Sharpe ratio used in AlphaPortfolio, we add three investor’s objectives for (i) achieving excess alpha by maximizing the information ratio; (ii) mitigating downside risks through optimizing maximum drawdown-adjusted return; and (iii) reducing transaction costs via restricting the turnover rate. We also introduce four new features: momentum, short-term reversal, drawdown, and maximum drawdown to the framework. Our objective function formulation allows for controlling the trade-off between both maximum drawdown and turnover with respect to realized return, creating flexible trading strategies for various risk appetites. The maximum drawdown efficient frontier curve, derived using a range of values of hyper-parameter α, reflects the similar concave relationship as observed in the theoretical study by Chekhlov et al. [5]. To improve the interpretability of the deep neural network and drive insights into traditional factor investment, we further explore the drivers that contribute to the top and bottom performing firms by running regression analysis using Random Forest, which achieves R2 of approximately 0.8 in producing the same winner scores as our model. Finally, to uncover the balance between profits and diversification, we investigate the impact of the trading size on strategy behaviors.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116043788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the dramatic development of deep learning in the past decade, interpretability has been one of the most important challenges that often prevents neural networks from being applied to fields such as finance. Among many existing explainable analyses, counterfactual generation has become widely used for understanding neural networks and making tailored recommendations. However, few studies are devoted to providing quantitative measures for evaluating counterfactuals. In this paper, we propose a quantitative approach based on maximum mean discrepancy (MMD). We employ several existing counterfactual methods to demonstrate this proposed method on the MNIST image data set and two tabular financial data sets, Lending Club (LCD) and Give Me Some Credit (GMC). The results demonstrate the potential usefulness as well as the simplicity of the proposed method.
随着过去十年深度学习的迅猛发展,可解释性一直是阻碍神经网络应用于金融等领域的最重要挑战之一。在许多现有的可解释分析中,反事实生成已被广泛用于理解神经网络并提出量身定制的建议。然而,很少有研究致力于提供定量的方法来评估反事实。在本文中,我们提出了一种基于最大平均差异(MMD)的定量方法。我们采用了几种现有的反事实方法,在MNIST图像数据集和两个表格金融数据集,Lending Club (LCD)和Give Me Some Credit (GMC)上验证了该方法。结果表明,该方法具有潜在的实用性和简单性。
{"title":"Understanding Counterfactual Generation using Maximum Mean Discrepancy","authors":"Wei Zhang, Brian Barr, J. Paisley","doi":"10.1145/3533271.3561759","DOIUrl":"https://doi.org/10.1145/3533271.3561759","url":null,"abstract":"With the dramatic development of deep learning in the past decade, interpretability has been one of the most important challenges that often prevents neural networks from being applied to fields such as finance. Among many existing explainable analyses, counterfactual generation has become widely used for understanding neural networks and making tailored recommendations. However, few studies are devoted to providing quantitative measures for evaluating counterfactuals. In this paper, we propose a quantitative approach based on maximum mean discrepancy (MMD). We employ several existing counterfactual methods to demonstrate this proposed method on the MNIST image data set and two tabular financial data sets, Lending Club (LCD) and Give Me Some Credit (GMC). The results demonstrate the potential usefulness as well as the simplicity of the proposed method.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121934406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In their pursue for profit, market makers contribute liquidity and thus play a fundamental role for the health of financial markets. The mechanism used to rank bids and asks in order-driven markets can influence trader behaviour and discourage market making, with obvious consequences on market fundamentals. This is the rationale behind market trading mechanisms, which assign weight to both the spread of two-sided orders and order prices. In this work, we assess the effectiveness of this proposal from a game-theoretic standpoint. We use strategic agents and explicitly define a utility function that treats the probability of a trader becoming a market maker as a pure strategy. We then employ empirical game-theoretic analysis to analyse the market at equilibrium; we illustrate the strategic responses to different setups of the matching mechanisms, how agents are incentivised to become market makers, agent behaviour and market states. Our analysis shows that this spread-based priority works well to reduce market volatility and maintain trading volume, provided that an appropriate setting is used, which weighs spread ranking and price ranking .
{"title":"Incentivising Market Making in Financial Markets","authors":"Ji Qi, Carmine Ventre","doi":"10.1145/3533271.3561706","DOIUrl":"https://doi.org/10.1145/3533271.3561706","url":null,"abstract":"In their pursue for profit, market makers contribute liquidity and thus play a fundamental role for the health of financial markets. The mechanism used to rank bids and asks in order-driven markets can influence trader behaviour and discourage market making, with obvious consequences on market fundamentals. This is the rationale behind market trading mechanisms, which assign weight to both the spread of two-sided orders and order prices. In this work, we assess the effectiveness of this proposal from a game-theoretic standpoint. We use strategic agents and explicitly define a utility function that treats the probability of a trader becoming a market maker as a pure strategy. We then employ empirical game-theoretic analysis to analyse the market at equilibrium; we illustrate the strategic responses to different setups of the matching mechanisms, how agents are incentivised to become market makers, agent behaviour and market states. Our analysis shows that this spread-based priority works well to reduce market volatility and maintain trading volume, provided that an appropriate setting is used, which weighs spread ranking and price ranking .","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130248369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anubha Pandey, Alekhya Bhatraju, Shiv Markam, Deepak L. Bhatt
Generative Adversarial Networks (GANs) are known for their ability to learn data distribution and hence exist as a suitable alternative to handle class imbalance through oversampling. However, it still fails to capture the diversity of the minority class owing to their limited representation, for example, frauds in our study. Particularly the fraudulent patterns closer to the class boundary get missed by the model. This paper proposes using GANs to simulate fraud transaction patterns conditioned on genuine transactions, thereby enabling the model to learn a translation function between both spaces. Further to synthesize fraudulent samples from the class boundary, we trained GANs using losses inspired by data poisoning attack literature and discussed their efficacy in improving fraud detection classifier performance. The efficacy of our proposed framework is demonstrated through experimental results on the publicly available European Credit-Card Dataset and CIS Fraud Dataset.
{"title":"Adversarial Fraud Generation for Improved Detection","authors":"Anubha Pandey, Alekhya Bhatraju, Shiv Markam, Deepak L. Bhatt","doi":"10.1145/3533271.3561723","DOIUrl":"https://doi.org/10.1145/3533271.3561723","url":null,"abstract":"Generative Adversarial Networks (GANs) are known for their ability to learn data distribution and hence exist as a suitable alternative to handle class imbalance through oversampling. However, it still fails to capture the diversity of the minority class owing to their limited representation, for example, frauds in our study. Particularly the fraudulent patterns closer to the class boundary get missed by the model. This paper proposes using GANs to simulate fraud transaction patterns conditioned on genuine transactions, thereby enabling the model to learn a translation function between both spaces. Further to synthesize fraudulent samples from the class boundary, we trained GANs using losses inspired by data poisoning attack literature and discussed their efficacy in improving fraud detection classifier performance. The efficacy of our proposed framework is demonstrated through experimental results on the publicly available European Credit-Card Dataset and CIS Fraud Dataset.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131704470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacobo Roa-Vicens, Y. Xu, Ricardo Silva, D. Mandic
Recurrent neural networks (RNNs) have proven to be particularly effective for the paradigms of learning and modelling time series. However, sequential data of high dimensions are considerably more difficult and computationally expensive to model, as the number of parameters required to train the RNN grows exponentially with data dimensionality. This is also the case with time series from limit order books, the electronic registries where prices of securities are formed in public markets. To this end, tensorization of neural networks provides an efficient method to reduce the number of model parameters, and has been applied successfully to high-dimensional series such as video sequences and financial time series, for example, using tensor-train RNNs (TTRNNs). However, such TTRNNs suffer from a number of shortcomings, including: (i) model sensitivity to the ordering of core tensor contractions; (ii) training sensitivity to weight initialization; and (iii) exploding or vanishing gradient problems due to the recurrent propagation through the tensor-train topology. Recent studies showed that embedding a multi-linear graph filter to model RNN states (Recurrent Graph Tensor Network, RGTN) provides enhanced flexibility and expressive power to tensor networks, while mitigating the shortcomings of TTRNNs. In this paper, we demonstrate the advantages arising from the use of graph filters to model limit order book sequences of high dimension as compared with the state-of-the-art benchmarks. It is shown that the combination of the graph module (to mitigate problematic gradients) with the radial structure (to make the tensor network architecture flexible) results in substantial improvements in output variance, training time and number of parameters required, without any sacrifice in accuracy.
{"title":"Graph and tensor-train recurrent neural networks for high-dimensional models of limit order books","authors":"Jacobo Roa-Vicens, Y. Xu, Ricardo Silva, D. Mandic","doi":"10.1145/3533271.3561710","DOIUrl":"https://doi.org/10.1145/3533271.3561710","url":null,"abstract":"Recurrent neural networks (RNNs) have proven to be particularly effective for the paradigms of learning and modelling time series. However, sequential data of high dimensions are considerably more difficult and computationally expensive to model, as the number of parameters required to train the RNN grows exponentially with data dimensionality. This is also the case with time series from limit order books, the electronic registries where prices of securities are formed in public markets. To this end, tensorization of neural networks provides an efficient method to reduce the number of model parameters, and has been applied successfully to high-dimensional series such as video sequences and financial time series, for example, using tensor-train RNNs (TTRNNs). However, such TTRNNs suffer from a number of shortcomings, including: (i) model sensitivity to the ordering of core tensor contractions; (ii) training sensitivity to weight initialization; and (iii) exploding or vanishing gradient problems due to the recurrent propagation through the tensor-train topology. Recent studies showed that embedding a multi-linear graph filter to model RNN states (Recurrent Graph Tensor Network, RGTN) provides enhanced flexibility and expressive power to tensor networks, while mitigating the shortcomings of TTRNNs. In this paper, we demonstrate the advantages arising from the use of graph filters to model limit order book sequences of high dimension as compared with the state-of-the-art benchmarks. It is shown that the combination of the graph module (to mitigate problematic gradients) with the radial structure (to make the tensor network architecture flexible) results in substantial improvements in output variance, training time and number of parameters required, without any sacrifice in accuracy.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122342126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Boulenger, David C. Liu, George Philippe Farajalla
How can banks recommend relevant banking products such as debit, credit cards or term deposits, as well as learn a rich user representation for segmentation and user profiling, all via a single model? We present a sequence-to-item recommendation framework that uses a novel input data representation, accounting for the sequential and temporal context of both item ownership and user metadata, fed to a multi-head self-attentive encoder. We assess the performance of our model on the largest publicly available banking product recommendation dataset. Our model achieves 98.9% Precision@1 and 40.2% Precision@5, outperforming a state-of-the-art model as well as a common XGBoost-based baseline model tailored for this dataset and a system reportedly employed in industry for this task. Next, using the encoder embedding we obtain a continuous representation of users and their past product behavior. We demonstrate, in a case study, that this representation can be used for user segmentation and profiling, both critical to decision-making in organizations; for example, in designing and differentiating value propositions. The proposed approach is more inclusive and objective than the traditional ones employed by banks. With this work, we expose the benefits of employing a recommendation model based on self-attention in a real-world setting. The continuous user representation learned can yield far more impact than individual user-level recommendations. Both the proposed model and approach to segmentation and profiling are also applicable in other industries, beyond banking.
{"title":"Sequential Banking Products Recommendation and User Profiling in One Go","authors":"Alexandre Boulenger, David C. Liu, George Philippe Farajalla","doi":"10.1145/3533271.3561697","DOIUrl":"https://doi.org/10.1145/3533271.3561697","url":null,"abstract":"How can banks recommend relevant banking products such as debit, credit cards or term deposits, as well as learn a rich user representation for segmentation and user profiling, all via a single model? We present a sequence-to-item recommendation framework that uses a novel input data representation, accounting for the sequential and temporal context of both item ownership and user metadata, fed to a multi-head self-attentive encoder. We assess the performance of our model on the largest publicly available banking product recommendation dataset. Our model achieves 98.9% Precision@1 and 40.2% Precision@5, outperforming a state-of-the-art model as well as a common XGBoost-based baseline model tailored for this dataset and a system reportedly employed in industry for this task. Next, using the encoder embedding we obtain a continuous representation of users and their past product behavior. We demonstrate, in a case study, that this representation can be used for user segmentation and profiling, both critical to decision-making in organizations; for example, in designing and differentiating value propositions. The proposed approach is more inclusive and objective than the traditional ones employed by banks. With this work, we expose the benefits of employing a recommendation model based on self-attention in a real-world setting. The continuous user representation learned can yield far more impact than individual user-level recommendations. Both the proposed model and approach to segmentation and profiling are also applicable in other industries, beyond banking.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123103728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning a high-performance trade execution model via reinforcement learning (RL) requires interaction with the real dynamic market. However, the massive interactions required by direct RL would result in a significant training overhead. In this paper, we propose a cost-efficient reinforcement learning (RL) approach called Deep Dyna-Double Q-learning (D3Q), which integrates deep reinforcement learning and planning to reduce the training overhead while improving the trading performance. Specifically, D3Q includes a learnable market environment model, which approximates the market impact using real market experience, to enhance policy learning via the learned environment. Meanwhile, we propose a novel state-balanced exploration scheme to solve the exploration bias caused by the non-increasing residual inventory during the trade execution to accelerate model learning. As demonstrated by our extensive experiments, the proposed D3Q framework significantly increases sample efficiency and outperforms state-of-the-art methods on average trading cost as well.
{"title":"Cost-Efficient Reinforcement Learning for Optimal Trade Execution on Dynamic Market Environment","authors":"Di Chen, Yada Zhu, Miao Liu, Jianbo Li","doi":"10.1145/3533271.3561761","DOIUrl":"https://doi.org/10.1145/3533271.3561761","url":null,"abstract":"Learning a high-performance trade execution model via reinforcement learning (RL) requires interaction with the real dynamic market. However, the massive interactions required by direct RL would result in a significant training overhead. In this paper, we propose a cost-efficient reinforcement learning (RL) approach called Deep Dyna-Double Q-learning (D3Q), which integrates deep reinforcement learning and planning to reduce the training overhead while improving the trading performance. Specifically, D3Q includes a learnable market environment model, which approximates the market impact using real market experience, to enhance policy learning via the learned environment. Meanwhile, we propose a novel state-balanced exploration scheme to solve the exploration bias caused by the non-increasing residual inventory during the trade execution to accelerate model learning. As demonstrated by our extensive experiments, the proposed D3Q framework significantly increases sample efficiency and outperforms state-of-the-art methods on average trading cost as well.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"58 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating high dimensional covariance matrices for portfolio optimization is challenging because the number of parameters to be estimated grows quadratically in the number of assets. When the matrix dimension exceeds the sample size, the sample covariance matrix becomes singular. A possible solution is to impose a (latent) factor structure for the cross-section of asset returns as in the popular capital asset pricing model. Recent research suggests dimension reduction techniques to estimate the factors in a data-driven fashion. We present an asymmetric autoencoder neural network-based estimator that incorporates the factor structure in its architecture and jointly estimates the factors and their loadings. We test our method against well established dimension reduction techniques from the literature and compare them to observable factors as benchmark in an empirical experiment using stock returns of the past five decades. Results show that the proposed estimator is very competitive, as it significantly outperforms the benchmark across most scenarios. Analyzing the loadings, we find that the constructed factors are related to the stocks’ sector classification.
{"title":"Asymmetric Autoencoders for Factor-Based Covariance Matrix Estimation","authors":"Kevin Huynh, Gregor Lenhard","doi":"10.1145/3533271.3561715","DOIUrl":"https://doi.org/10.1145/3533271.3561715","url":null,"abstract":"Estimating high dimensional covariance matrices for portfolio optimization is challenging because the number of parameters to be estimated grows quadratically in the number of assets. When the matrix dimension exceeds the sample size, the sample covariance matrix becomes singular. A possible solution is to impose a (latent) factor structure for the cross-section of asset returns as in the popular capital asset pricing model. Recent research suggests dimension reduction techniques to estimate the factors in a data-driven fashion. We present an asymmetric autoencoder neural network-based estimator that incorporates the factor structure in its architecture and jointly estimates the factors and their loadings. We test our method against well established dimension reduction techniques from the literature and compare them to observable factors as benchmark in an empirical experiment using stock returns of the past five decades. Results show that the proposed estimator is very competitive, as it significantly outperforms the benchmark across most scenarios. Analyzing the loadings, we find that the constructed factors are related to the stocks’ sector classification.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133907677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In equity trading, internalization is the predominant execution method for uninformed order flow, allowing retail brokers to realize cost savings and thereby offer price improvements to customers. In cryptocurrency trading, there are doubts as to whether informed and uninformed traders can be distinguished in the same way, leading brokers to seek cost savings through internal order matching instead. Using the historical order flow of the German cryptocurrency broker BISON, we present a prediction-based approach to internal order matching: Upon receiving a customer order, our model forecasts whether future order flow will be sufficient to neutralize the order before the settlement date. With a prediction accuracy of 85%, it enables brokers to match three-quarters of order volume internally, which is three times as much as a traditional static approach, and realize meaningful cost savings, even after accounting for common minimum price improvements.
{"title":"Intelligent Inventory Management for Cryptocurrency Brokers","authors":"Christopher Felder, J. Seemüller","doi":"10.1145/3533271.3561661","DOIUrl":"https://doi.org/10.1145/3533271.3561661","url":null,"abstract":"In equity trading, internalization is the predominant execution method for uninformed order flow, allowing retail brokers to realize cost savings and thereby offer price improvements to customers. In cryptocurrency trading, there are doubts as to whether informed and uninformed traders can be distinguished in the same way, leading brokers to seek cost savings through internal order matching instead. Using the historical order flow of the German cryptocurrency broker BISON, we present a prediction-based approach to internal order matching: Upon receiving a customer order, our model forecasts whether future order flow will be sufficient to neutralize the order before the settlement date. With a prediction accuracy of 85%, it enables brokers to match three-quarters of order volume internally, which is three times as much as a traditional static approach, and realize meaningful cost savings, even after accounting for common minimum price improvements.","PeriodicalId":134888,"journal":{"name":"Proceedings of the Third ACM International Conference on AI in Finance","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122394803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}