Pub Date : 2021-10-31DOI: 10.3905/jfds.2021.3.4.001
F. Fabozzi
Cathy Scott General Manager and Publisher Several articles, two of which were published in this journal, have shown how reinforcement learning can be used to take trading costs into account in hedging decisions. In the lead article of this issue, “Deep Hedging of Derivatives Using Reinforcement Learning,” Jay Cao, Jacky Chen, John Hull, and Zissis Poulos extend the standard reinforcement learning approach by utilizing multiple Q-functions for the purpose of increasing the range of objective functions that can be used and by using algorithms that allow the state space and action space to be continuous. The authors suggest an approach where a relatively simple valuation model is used in conjunction with more complex models for the evolution of the asset price. This allows good hedges to be developed for asset price processes that are not associated with analytic pricing models. Deep sequence models have been applied to predicting asset returns. These models are flexible enough to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. More specifically, these models can outperform the conventionally used models because of their ability to detect path-dependence patterns. Lin William Cong, Ke Tang, Jingyuan Wang, and Yang Zhang in their article “Deep Sequence Modeling: Development and Applications in Asset Pricing,” show how to predict asset returns and measure risk premiums by applying deep sequence modeling. They begin by providing an overview of the development of deep sequence models, introducing their applications in asset pricing, and discussing the advantages and limitations of deep sequence models. A comparative analysis of these methods using data on US equities is then provided in the second part of the article where the authors demonstrate how sequence modeling benefits investors in general by incorporating complex historical path dependence. They report that long short-term memory has the best performance in terms of out-of-sample predictive R-squared, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks. In the formulation of an investment process, it is critical to build a view of causal relations among economic entities. Because of the complex and opaque nature of many market interactions, this can be challenging. Various models of economic causality have been proposed to both explain the past and aide investors in the investment process such as causal networks. Such networks provide an efficient framework for assisting with investment decisions that are supported by both quantitative and qualitative evidence. When building causal networks, the addition of more causes adds to the issue of computational complexity because of the necessity to calculate the combined impact of larger and larger sets of causes. In “Causal Uncertainty in Capital Markets: A Robust Noisy-Or Framework for Portfolio Management,” Joseph
{"title":"Managing Editor’s Letter","authors":"F. Fabozzi","doi":"10.3905/jfds.2021.3.4.001","DOIUrl":"https://doi.org/10.3905/jfds.2021.3.4.001","url":null,"abstract":"Cathy Scott General Manager and Publisher Several articles, two of which were published in this journal, have shown how reinforcement learning can be used to take trading costs into account in hedging decisions. In the lead article of this issue, “Deep Hedging of Derivatives Using Reinforcement Learning,” Jay Cao, Jacky Chen, John Hull, and Zissis Poulos extend the standard reinforcement learning approach by utilizing multiple Q-functions for the purpose of increasing the range of objective functions that can be used and by using algorithms that allow the state space and action space to be continuous. The authors suggest an approach where a relatively simple valuation model is used in conjunction with more complex models for the evolution of the asset price. This allows good hedges to be developed for asset price processes that are not associated with analytic pricing models. Deep sequence models have been applied to predicting asset returns. These models are flexible enough to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. More specifically, these models can outperform the conventionally used models because of their ability to detect path-dependence patterns. Lin William Cong, Ke Tang, Jingyuan Wang, and Yang Zhang in their article “Deep Sequence Modeling: Development and Applications in Asset Pricing,” show how to predict asset returns and measure risk premiums by applying deep sequence modeling. They begin by providing an overview of the development of deep sequence models, introducing their applications in asset pricing, and discussing the advantages and limitations of deep sequence models. A comparative analysis of these methods using data on US equities is then provided in the second part of the article where the authors demonstrate how sequence modeling benefits investors in general by incorporating complex historical path dependence. They report that long short-term memory has the best performance in terms of out-of-sample predictive R-squared, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks. In the formulation of an investment process, it is critical to build a view of causal relations among economic entities. Because of the complex and opaque nature of many market interactions, this can be challenging. Various models of economic causality have been proposed to both explain the past and aide investors in the investment process such as causal networks. Such networks provide an efficient framework for assisting with investment decisions that are supported by both quantitative and qualitative evidence. When building causal networks, the addition of more causes adds to the issue of computational complexity because of the necessity to calculate the combined impact of larger and larger sets of causes. In “Causal Uncertainty in Capital Markets: A Robust Noisy-Or Framework for Portfolio Management,” Joseph","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114810693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-31DOI: 10.3905/jfds.2021.3.4.130
Dhruv Desai, D. Mehta
Identifying similar mutual funds among a given universe of funds has many applications, including competitor analysis, marketing and sales, tax loss harvesting, and so on. For a contemporary analyst, the most popular approach to finding similar funds is to look up a categorization system such as Morningstar categorization. Morningstar categorization has been heavily investigated by academic researchers from various angles, including using unsupervised clustering techniques in which clusters were found to be inconsistent with categorization. Recently, however, categorization has been studied using supervised classification techniques, with the categories being the target labels. Categorization was indeed learnable with very high accuracy using a purely data-driven approach, causing a paradox: Clustering was inconsistent with respect to categorization, whereas supervised classification was able to reproduce (near) complete categorization. Here, the authors resolve this apparent paradox by pointing out incorrect uses and interpretations of machine learning techniques in the previous academic literature. The authors demonstrate that by using an appropriate list of variables and metrics to identify the optimal number of clusters and preprocessing the data using distance metric learning, one can indeed reproduce the Morningstar categorization using a data-driven approach. The present work puts an end to the debate on this issue and establishes that the Morningstar categorization is indeed intrinsically rigorous, consistent, rule-based, and reproducible using data-driven approaches, if machine learning techniques are correctly implemented. Key Findings ▪ Academic literature has time and again questioned the consistency and robustness of mutual fund’s categorization systems, such as Morningstar categorization, by contrasting them with unsupervised clustering of funds. ▪ Here, the authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric. ▪ The authors employ a machine learning technique called distance metric learning and reproduce the Morningstar categorization completely using a data-driven approach.
{"title":"On Robustness of Mutual Funds Categorization and Distance Metric Learning","authors":"Dhruv Desai, D. Mehta","doi":"10.3905/jfds.2021.3.4.130","DOIUrl":"https://doi.org/10.3905/jfds.2021.3.4.130","url":null,"abstract":"Identifying similar mutual funds among a given universe of funds has many applications, including competitor analysis, marketing and sales, tax loss harvesting, and so on. For a contemporary analyst, the most popular approach to finding similar funds is to look up a categorization system such as Morningstar categorization. Morningstar categorization has been heavily investigated by academic researchers from various angles, including using unsupervised clustering techniques in which clusters were found to be inconsistent with categorization. Recently, however, categorization has been studied using supervised classification techniques, with the categories being the target labels. Categorization was indeed learnable with very high accuracy using a purely data-driven approach, causing a paradox: Clustering was inconsistent with respect to categorization, whereas supervised classification was able to reproduce (near) complete categorization. Here, the authors resolve this apparent paradox by pointing out incorrect uses and interpretations of machine learning techniques in the previous academic literature. The authors demonstrate that by using an appropriate list of variables and metrics to identify the optimal number of clusters and preprocessing the data using distance metric learning, one can indeed reproduce the Morningstar categorization using a data-driven approach. The present work puts an end to the debate on this issue and establishes that the Morningstar categorization is indeed intrinsically rigorous, consistent, rule-based, and reproducible using data-driven approaches, if machine learning techniques are correctly implemented. Key Findings ▪ Academic literature has time and again questioned the consistency and robustness of mutual fund’s categorization systems, such as Morningstar categorization, by contrasting them with unsupervised clustering of funds. ▪ Here, the authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric. ▪ The authors employ a machine learning technique called distance metric learning and reproduce the Morningstar categorization completely using a data-driven approach.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125460857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Schwendner, Jochen Papenbrock, Markus Jaeger, Stephan Krügel
In this article, the authors present a conceptual framework named adaptive seriational risk parity (ASRP) to extend hierarchical risk parity (HRP) as an asset allocation heuristic. The first step of HRP (quasi-diagonalization), determining the hierarchy of assets, is required for the actual allocation done in the second step (recursive bisectioning). In the original HRP scheme, this hierarchy is found using single-linkage hierarchical clustering of the correlation matrix, which is a static tree-based method. The authors compare the performance of the standard HRP with other static and adaptive tree-based methods, as well as seriation-based methods that do not rely on trees. Seriation is a broader concept allowing reordering of the rows or columns of a matrix to best express similarities between the elements. Each discussed variation leads to a different time series reflecting portfolio performance using a 20-year backtest of a multi-asset futures universe. Unsupervised learningbased on these time-series creates a taxonomy that groups the strategies in high correspondence to the construction hierarchy of the various types of ASRP. Performance analysis of the variations shows that most of the static tree-based alternatives to HRP outperform the single-linkage clustering used in HRP on a risk-adjusted basis. Adaptive tree methods show mixed results, and most generic seriation-based approaches underperform. Key Findings ▪ The authors introduce the adaptive seriational risk parity (ASRP) framework as a hierarchy of decisions to implement the quasi-diagonalization step of hierarchical risk parity (HRP) with seriation-based and tree-based variations as alternatives to single linkage. Tree-based variations are further separated in static and adaptive versions. Altogether, 57 variations are discussed and connected to the literature. ▪ Backtests of the 57 different HRP-type asset allocation variations applied to a multi-asset futures universe lead to a correlation matrix of the resulting 57 portfolio return time series. This portfolio return correlation matrix can be visualized as a dendrogram using single-linkage clustering. The correlation hierarchy reflected by the dendrogram is similar to the construction hierarchy of the quasi-diagonalization step. Most seriation-based strategies seem to underperform HRP on a risk-adjusted basis. Most static tree-based variations outperform HRP, whereas adaptive tree-based methods show mixed results. ▪ The presented variations fit into a triple artificial intelligence approach to connect synthetic data generation with explainable machine learning. This approach generates synthetic market data in the first step. The second step applies an HRP-type portfolio allocation approach as discussed in this article. The third step uses a model-agnostic explanation such as the SHAP framework to explain the resulting performance with features of the synthetic market data and with model selection in the second step.
{"title":"Adaptive Seriational Risk Parity and Other Extensions for Heuristic Portfolio Construction Using Machine Learning and Graph Theory","authors":"Peter Schwendner, Jochen Papenbrock, Markus Jaeger, Stephan Krügel","doi":"10.3905/jfds.2021.1.078","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.078","url":null,"abstract":"In this article, the authors present a conceptual framework named adaptive seriational risk parity (ASRP) to extend hierarchical risk parity (HRP) as an asset allocation heuristic. The first step of HRP (quasi-diagonalization), determining the hierarchy of assets, is required for the actual allocation done in the second step (recursive bisectioning). In the original HRP scheme, this hierarchy is found using single-linkage hierarchical clustering of the correlation matrix, which is a static tree-based method. The authors compare the performance of the standard HRP with other static and adaptive tree-based methods, as well as seriation-based methods that do not rely on trees. Seriation is a broader concept allowing reordering of the rows or columns of a matrix to best express similarities between the elements. Each discussed variation leads to a different time series reflecting portfolio performance using a 20-year backtest of a multi-asset futures universe. Unsupervised learningbased on these time-series creates a taxonomy that groups the strategies in high correspondence to the construction hierarchy of the various types of ASRP. Performance analysis of the variations shows that most of the static tree-based alternatives to HRP outperform the single-linkage clustering used in HRP on a risk-adjusted basis. Adaptive tree methods show mixed results, and most generic seriation-based approaches underperform. Key Findings ▪ The authors introduce the adaptive seriational risk parity (ASRP) framework as a hierarchy of decisions to implement the quasi-diagonalization step of hierarchical risk parity (HRP) with seriation-based and tree-based variations as alternatives to single linkage. Tree-based variations are further separated in static and adaptive versions. Altogether, 57 variations are discussed and connected to the literature. ▪ Backtests of the 57 different HRP-type asset allocation variations applied to a multi-asset futures universe lead to a correlation matrix of the resulting 57 portfolio return time series. This portfolio return correlation matrix can be visualized as a dendrogram using single-linkage clustering. The correlation hierarchy reflected by the dendrogram is similar to the construction hierarchy of the quasi-diagonalization step. Most seriation-based strategies seem to underperform HRP on a risk-adjusted basis. Most static tree-based variations outperform HRP, whereas adaptive tree-based methods show mixed results. ▪ The presented variations fit into a triple artificial intelligence approach to connect synthetic data generation with explainable machine learning. This approach generates synthetic market data in the first step. The second step applies an HRP-type portfolio allocation approach as discussed in this article. The third step uses a model-agnostic explanation such as the SHAP framework to explain the resulting performance with features of the synthetic market data and with model selection in the second step.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"77 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131204804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interpretability, transparency, and auditability of machine learning (ML)-driven investment has become a key issue for investment managers as many look to enhance or replace traditional factor-based investing. The authors show that symbolic artificial intelligence (SAI) provides a solution to this conundrum, with superior return characteristics compared to traditional factor-based stock selection, while producing interpretable outcomes. Their SAI approach is a form of satisficing that systematically learns investment decision rules (symbols) for stock selection, using an a priori algorithm, avoiding the need for error-prone approaches for secondary explanations (known as XAI). The authors compare the empirical performance of an SAI approach with a traditional factor-based stock selection approach, in an emerging market equities universe. They show that SAI generates superior return characteristics and would provide a viable and interpretable alternative to factor-based stock selection. Their approach has significant implications for investment managers, providing an ML alternative to factor investing but with interpretable outcomes that could satisfy internal and external stakeholders. Key Findings ▪ Symbolic artificial intelligence (SAI) for stock selection, a form of satisficing, provides an alternative to factor investing and overcomes the interpretability issues of many machine learning (ML) approaches. ▪ An SAI that could be applied at scale is shown to produce superior return characteristics to traditional factor-based stock selection. ▪ SAI’s superior stock selection is examined using notional visualizations of its decision boundaries.
{"title":"Interpretable, Transparent, and Auditable Machine Learning: An Alternative to Factor Investing","authors":"Daniel Philps, D. Tilles, Timothy P. Law","doi":"10.3905/jfds.2021.1.077","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.077","url":null,"abstract":"Interpretability, transparency, and auditability of machine learning (ML)-driven investment has become a key issue for investment managers as many look to enhance or replace traditional factor-based investing. The authors show that symbolic artificial intelligence (SAI) provides a solution to this conundrum, with superior return characteristics compared to traditional factor-based stock selection, while producing interpretable outcomes. Their SAI approach is a form of satisficing that systematically learns investment decision rules (symbols) for stock selection, using an a priori algorithm, avoiding the need for error-prone approaches for secondary explanations (known as XAI). The authors compare the empirical performance of an SAI approach with a traditional factor-based stock selection approach, in an emerging market equities universe. They show that SAI generates superior return characteristics and would provide a viable and interpretable alternative to factor-based stock selection. Their approach has significant implications for investment managers, providing an ML alternative to factor investing but with interpretable outcomes that could satisfy internal and external stakeholders. Key Findings ▪ Symbolic artificial intelligence (SAI) for stock selection, a form of satisficing, provides an alternative to factor investing and overcomes the interpretability issues of many machine learning (ML) approaches. ▪ An SAI that could be applied at scale is shown to produce superior return characteristics to traditional factor-based stock selection. ▪ SAI’s superior stock selection is examined using notional visualizations of its decision boundaries.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Can machines learn to reliably predict auction outcomes in financial markets? The authors study this question using classification methods from machine learning and auction data from the request-for-quote protocol used in many multi-dealer-to-client markets. Their answer is affirmative. The highest performance is achieved using gradient-boosted decision trees coupled with preprocessing tools to handle class imbalance. Competition level, client identity, and bid–ask quotes are shown to be the most important features. To illustrate the usefulness of these findings, the authors create a profit-maximizing agent to suggest price quotes. Results show more aggressive behavior compared to human dealers. Key Findings ▪ We propose a machine learning–based approach for determining auction outcomes by exploring the use of classification algorithms for outcome predictions and show that gradient-boosted decision trees obtain the best performance on an industrial data set. ▪ We uncover bid–ask normalized spread levels and competition level as the most important features and evaluate their influence on predictions through Shapley value estimation. ▪ We demonstrate the usefulness of our approach by creating a profit-maximizing agent using a classifier for win probability predictions. Our agent’s behavior is aggressive compared to human dealers.
{"title":"Classification Methods for Market Making in Auction Markets","authors":"Nikolaj Normann Holm, Mansoor Hussain, M. Kulahci","doi":"10.3905/jfds.2021.1.076","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.076","url":null,"abstract":"Can machines learn to reliably predict auction outcomes in financial markets? The authors study this question using classification methods from machine learning and auction data from the request-for-quote protocol used in many multi-dealer-to-client markets. Their answer is affirmative. The highest performance is achieved using gradient-boosted decision trees coupled with preprocessing tools to handle class imbalance. Competition level, client identity, and bid–ask quotes are shown to be the most important features. To illustrate the usefulness of these findings, the authors create a profit-maximizing agent to suggest price quotes. Results show more aggressive behavior compared to human dealers. Key Findings ▪ We propose a machine learning–based approach for determining auction outcomes by exploring the use of classification algorithms for outcome predictions and show that gradient-boosted decision trees obtain the best performance on an industrial data set. ▪ We uncover bid–ask normalized spread levels and competition level as the most important features and evaluate their influence on predictions through Shapley value estimation. ▪ We demonstrate the usefulness of our approach by creating a profit-maximizing agent using a classifier for win probability predictions. Our agent’s behavior is aggressive compared to human dealers.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124076596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanjiv Ranjan Das, Michele Donini, J. Gelman, Kevin Haas, Mila Hardt, Jared Katzman, K. Kenthapadi, Pedro Larroy, Pinar Yilmaz, Bilal Zafar
The authors present a machine learning pipeline for fairness-aware machine learning (FAML) in finance that encompasses metrics for fairness (and accuracy). Whereas accuracy metrics are well understood and the principal ones are used frequently, there is no consensus as to which of several available measures for fairness should be used in a generic manner in the financial services industry. The authors explore these measures and discuss which ones to focus on at various stages in the ML pipeline, pre-training and post-training, and they examine simple bias mitigation approaches. Using a standard dataset, they show that the sequencing in their FAML pipeline offers a cogent approach to arriving at a fair and accurate ML model. The authors discuss the intersection of bias metrics with legal considerations in the United States, and the entanglement of explainability and fairness is exemplified in the case study. They discuss possible approaches for training ML models while satisfying constraints imposed from various fairness metrics and the role of causality in assessing fairness. Key Findings ▪ Sources of bias are presented and a range of metrics is considered for machine learning applications in finance, both pre-training and post-training of models. ▪ A process of using the metrics to arrive at fair models is discussed. ▪ Various considerations for the choice of specific metrics are also analyzed.
{"title":"Fairness Measures for Machine Learning in Finance","authors":"Sanjiv Ranjan Das, Michele Donini, J. Gelman, Kevin Haas, Mila Hardt, Jared Katzman, K. Kenthapadi, Pedro Larroy, Pinar Yilmaz, Bilal Zafar","doi":"10.3905/jfds.2021.1.075","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.075","url":null,"abstract":"The authors present a machine learning pipeline for fairness-aware machine learning (FAML) in finance that encompasses metrics for fairness (and accuracy). Whereas accuracy metrics are well understood and the principal ones are used frequently, there is no consensus as to which of several available measures for fairness should be used in a generic manner in the financial services industry. The authors explore these measures and discuss which ones to focus on at various stages in the ML pipeline, pre-training and post-training, and they examine simple bias mitigation approaches. Using a standard dataset, they show that the sequencing in their FAML pipeline offers a cogent approach to arriving at a fair and accurate ML model. The authors discuss the intersection of bias metrics with legal considerations in the United States, and the entanglement of explainability and fairness is exemplified in the case study. They discuss possible approaches for training ML models while satisfying constraints imposed from various fairness metrics and the role of causality in assessing fairness. Key Findings ▪ Sources of bias are presented and a range of metrics is considered for machine learning applications in finance, both pre-training and post-training of models. ▪ A process of using the metrics to arrive at fair models is discussed. ▪ Various considerations for the choice of specific metrics are also analyzed.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117224704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Limit order books (LOBs) have generated big financial data for analysis and prediction from both academic community and industry practitioners. This article presents a benchmark LOB dataset from the Chinese stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on a linear regression model and deep learning models are compared. A practical short-term trading strategy framework based on the alpha signal generated is presented. The data and code are available on Github (github.com/HKGSAS). Key Findings ▪ There is a gap between benchmarking a high-frequency LOB dataset and model for researchers to objectively assess prediction performances, which this article serves to bridge. ▪ A more practically effective set of features is proposed to capture both LOB snapshots and periodic data. The prediction target is similarly too simplistic in the published literature—mid-price direction change for the next few events, which is not suitable for a practical trading strategy. The authors propose to predict the price change and volume magnitude over 12 short-term horizons. ▪ This article proposes comparing the performance of baseline linear regression and state-of-the-art deep learning models, based on both accuracy statistics and trading profits.
{"title":"Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets","authors":"Charles Huang, Weifeng Ge, Hongsong Chou, Xin Du","doi":"10.3905/jfds.2021.1.074","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.074","url":null,"abstract":"Limit order books (LOBs) have generated big financial data for analysis and prediction from both academic community and industry practitioners. This article presents a benchmark LOB dataset from the Chinese stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on a linear regression model and deep learning models are compared. A practical short-term trading strategy framework based on the alpha signal generated is presented. The data and code are available on Github (github.com/HKGSAS). Key Findings ▪ There is a gap between benchmarking a high-frequency LOB dataset and model for researchers to objectively assess prediction performances, which this article serves to bridge. ▪ A more practically effective set of features is proposed to capture both LOB snapshots and periodic data. The prediction target is similarly too simplistic in the published literature—mid-price direction change for the next few events, which is not suitable for a practical trading strategy. The authors propose to predict the price change and volume magnitude over 12 short-term horizons. ▪ This article proposes comparing the performance of baseline linear regression and state-of-the-art deep learning models, based on both accuracy statistics and trading profits.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Investors are faced with challenges in diversifying risks and protecting capital during crash periods. In this article, the authors incorporate regime information in the portfolio optimization context by identifying regimes for historical time periods using an ℓ1-trend filtering algorithm and exploring different machine learning techniques to forecast the probability of an upcoming stock market crash. They then apply a regime-based asset allocation to nominal risk parity strategy. Investors can further improve their investment performance by implementing a dollar-neutral factor momentum strategy as an overlay in conjunction with the core portfolio. The authors demonstrate that the time-series factor momentum strategy generates high risk-adjusted returns and exhibits pronounced defensive characteristics during market crashes. A volatility scaling approach is employed to manage the risk and further magnify the benefits of factor momentum. Empirical results suggest that the approach improves risk-adjusted returns by a substantial amount over the benchmark from both the standalone perspective and the contributory perspective. Key Findings ▪ The authors identify historical regimes with ℓ1-trend filtering and implement a regime-switching risk parity strategy with supervised learning methods to optimize the core portfolio allocation. ▪ By adding a long–short factor momentum strategy on top of the core diversified portfolios, the authors are able to further enhance the portfolio’s risk-adjusted return. ▪ The factor momentum strategy exhibits defensive characteristics during crashes, and its risks can be further managed by scaling the leverage based on the realized volatility.
{"title":"Factor Momentum and Regime-Switching Overlay Strategy","authors":"Junhan Gu, J. Mulvey","doi":"10.3905/jfds.2021.1.072","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.072","url":null,"abstract":"Investors are faced with challenges in diversifying risks and protecting capital during crash periods. In this article, the authors incorporate regime information in the portfolio optimization context by identifying regimes for historical time periods using an ℓ1-trend filtering algorithm and exploring different machine learning techniques to forecast the probability of an upcoming stock market crash. They then apply a regime-based asset allocation to nominal risk parity strategy. Investors can further improve their investment performance by implementing a dollar-neutral factor momentum strategy as an overlay in conjunction with the core portfolio. The authors demonstrate that the time-series factor momentum strategy generates high risk-adjusted returns and exhibits pronounced defensive characteristics during market crashes. A volatility scaling approach is employed to manage the risk and further magnify the benefits of factor momentum. Empirical results suggest that the approach improves risk-adjusted returns by a substantial amount over the benchmark from both the standalone perspective and the contributory perspective. Key Findings ▪ The authors identify historical regimes with ℓ1-trend filtering and implement a regime-switching risk parity strategy with supervised learning methods to optimize the core portfolio allocation. ▪ By adding a long–short factor momentum strategy on top of the core diversified portfolios, the authors are able to further enhance the portfolio’s risk-adjusted return. ▪ The factor momentum strategy exhibits defensive characteristics during crashes, and its risks can be further managed by scaling the leverage based on the realized volatility.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"9 31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131095759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-31DOI: 10.3905/jfds.2021.3.3.001
F. Fabozzi
n asset management, alternative data are diverse nontraditional datasets utilized by quantitative and fundamental institutional investors that is expected to enhance portfolio returns. In the opening article, “Alternative Data in Investment Manage-ment: Usage, Challenges, and Valuation,” Gene Ekster and Petter N. Kolm elaborate on what alternative data are, how they are used in asset management, key challenges that arise when working with alternative data, and how to assess the value of alternative databases. The key challenges include entity mapping, ticker-tagging, panel stabilization, and debiasing with modern statistical and machine learning approaches. There are several methodologies described for assessing the value of alternative datasets, including an event study methodology (which Ekster and Kolm refer to as the “golden triangle”), the application of report cards, and the relationship between a dataset’s structure of information content and its potential to enhance investment returns. The effectiveness of these methods is illustrated using a case study. In “Fairness Measures for Machine Learning in Finance,” by the team of Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar, propose a machine learning (ML) pipeline for fairness-aware machine learning (FAML) in finance that encompasses metrics for fairness (and accuracy). Various considerations for the choice of specific metrics are also analyzed. The authors discuss which of these measures to focus on at various stages in the ML pipeline, pre-training and post-training, as well as examining simple bias mitigation approaches. Using a stan-dard dataset, they show that the sequencing in of satisficing that systematically learns investment decision rules (symbols) for stock selection—provides a solution for dealing with these important issues while providing superior return characteristics compared to traditional factor-based stock selection and allowing for interpretable outcomes. Empirically comparing the performance of the proposed SAI approach with a traditional factor-based stock selection approach for an emerging market equities universe, the authors show that SAI generates superior return characteristics while providing a viable and interpretable alternative to factor-based stock selection. Their approach has significant implications for investment managers, providing an ML alternative to factor investing but with interpretable outcomes that could satisfy internal and external stakeholders.
{"title":"Managing Editor’s Letter","authors":"F. Fabozzi","doi":"10.3905/jfds.2021.3.3.001","DOIUrl":"https://doi.org/10.3905/jfds.2021.3.3.001","url":null,"abstract":"n asset management, alternative data are diverse nontraditional datasets utilized by quantitative and fundamental institutional investors that is expected to enhance portfolio returns. In the opening article, “Alternative Data in Investment Manage-ment: Usage, Challenges, and Valuation,” Gene Ekster and Petter N. Kolm elaborate on what alternative data are, how they are used in asset management, key challenges that arise when working with alternative data, and how to assess the value of alternative databases. The key challenges include entity mapping, ticker-tagging, panel stabilization, and debiasing with modern statistical and machine learning approaches. There are several methodologies described for assessing the value of alternative datasets, including an event study methodology (which Ekster and Kolm refer to as the “golden triangle”), the application of report cards, and the relationship between a dataset’s structure of information content and its potential to enhance investment returns. The effectiveness of these methods is illustrated using a case study. In “Fairness Measures for Machine Learning in Finance,” by the team of Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar, propose a machine learning (ML) pipeline for fairness-aware machine learning (FAML) in finance that encompasses metrics for fairness (and accuracy). Various considerations for the choice of specific metrics are also analyzed. The authors discuss which of these measures to focus on at various stages in the ML pipeline, pre-training and post-training, as well as examining simple bias mitigation approaches. Using a stan-dard dataset, they show that the sequencing in of satisficing that systematically learns investment decision rules (symbols) for stock selection—provides a solution for dealing with these important issues while providing superior return characteristics compared to traditional factor-based stock selection and allowing for interpretable outcomes. Empirically comparing the performance of the proposed SAI approach with a traditional factor-based stock selection approach for an emerging market equities universe, the authors show that SAI generates superior return characteristics while providing a viable and interpretable alternative to factor-based stock selection. Their approach has significant implications for investment managers, providing an ML alternative to factor investing but with interpretable outcomes that could satisfy internal and external stakeholders.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114541062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Söhnke M. Bartram, J. Branke, Giuliano De Rossi, Mehrshad Motahari
Machine learning (ML) methods are attracting considerable attention among academics in the field of finance. However, it is commonly believed that ML has not transformed the asset management industry to the same extent as other sectors. This survey focuses on the ML methods and empirical results available in the literature that matter most for active portfolio management. ML has asset management applications for signal generation, portfolio construction, and trade execution, and promising findings have been reported. Reinforcement learning (RL), in particular, is expected to play a more significant role in the industry. Nevertheless, the performance of a sample of active exchange-traded funds (ETF) that use ML in their investments tends to be mixed. Overall, ML techniques show great promise for active portfolio management, but investors should be cautioned against their main potential pitfalls. TOPICS: Big data/machine learning, portfolio construction, exchange-traded funds and applications, performance measurement Key Findings ▪ Machine learning (ML) methods have several advantages that can lead to successful applications in active portfolio management, including the ability to capture nonlinear patterns and a focus on prediction through ensemble learning. ▪ ML methods can be applied to different steps of the investment process, including signal generation, portfolio construction, and trade execution, with reinforcement learning expected to play a more significant role in the industry. ▪ Empirically, the investment performance of ML-based active exchange-traded funds is mixed.
{"title":"Machine Learning for Active Portfolio Management","authors":"Söhnke M. Bartram, J. Branke, Giuliano De Rossi, Mehrshad Motahari","doi":"10.3905/jfds.2021.1.071","DOIUrl":"https://doi.org/10.3905/jfds.2021.1.071","url":null,"abstract":"Machine learning (ML) methods are attracting considerable attention among academics in the field of finance. However, it is commonly believed that ML has not transformed the asset management industry to the same extent as other sectors. This survey focuses on the ML methods and empirical results available in the literature that matter most for active portfolio management. ML has asset management applications for signal generation, portfolio construction, and trade execution, and promising findings have been reported. Reinforcement learning (RL), in particular, is expected to play a more significant role in the industry. Nevertheless, the performance of a sample of active exchange-traded funds (ETF) that use ML in their investments tends to be mixed. Overall, ML techniques show great promise for active portfolio management, but investors should be cautioned against their main potential pitfalls. TOPICS: Big data/machine learning, portfolio construction, exchange-traded funds and applications, performance measurement Key Findings ▪ Machine learning (ML) methods have several advantages that can lead to successful applications in active portfolio management, including the ability to capture nonlinear patterns and a focus on prediction through ensemble learning. ▪ ML methods can be applied to different steps of the investment process, including signal generation, portfolio construction, and trade execution, with reinforcement learning expected to play a more significant role in the industry. ▪ Empirically, the investment performance of ML-based active exchange-traded funds is mixed.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128956439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}