Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel
In this article, the authors present a novel and highly flexible concept to simulate correlation matrixes of financial markets. It produces realistic outcomes regarding stylized facts of empirical correlation matrixes and requires no asset return input data. The matrix generation is based on a multiobjective evolutionary algorithm, so the authors call the approach matrix evolutions. It is suitable for parallel implementation and can be accelerated by graphics processing units and quantum-inspired algorithms. The approach is useful for backtesting, pricing, and hedging correlation-dependent investment strategies and financial products. Its potential is demonstrated in a machine learning case study for robust portfolio construction in a multi-asset universe: An explainable machine learning program links the synthetic matrixes to the portfolio volatility spread of hierarchical risk parity versus equal risk contribution. TOPICS: Statistical methods, big data/machine learning, portfolio construction, performance measurement Key Findings ▪ The authors introduce the matrix evolutions concept based on an evolutionary algorithm to simulate correlation matrixes useful for financial market applications. ▪ They apply the resulting synthetic correlation matrixes to benchmark hierarchical risk parity (HRP) and equal risk contribution allocations of a multi-asset futures portfolio and find HRP to show lower portfolio risk. ▪ The authors evaluate three competing machine learning methods to regress the portfolio risk spread between both allocation methods against statistical features of the synthetic correlation matrixes and then discuss the local and global feature importance using the SHAP framework by Lundberg and Lee (2017).
{"title":"Matrix Evolutions: Synthetic Correlations and Explainable Machine Learning for Constructing Robust Investment Portfolios","authors":"Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel","doi":"10.2139/ssrn.3663220","DOIUrl":"https://doi.org/10.2139/ssrn.3663220","url":null,"abstract":"In this article, the authors present a novel and highly flexible concept to simulate correlation matrixes of financial markets. It produces realistic outcomes regarding stylized facts of empirical correlation matrixes and requires no asset return input data. The matrix generation is based on a multiobjective evolutionary algorithm, so the authors call the approach matrix evolutions. It is suitable for parallel implementation and can be accelerated by graphics processing units and quantum-inspired algorithms. The approach is useful for backtesting, pricing, and hedging correlation-dependent investment strategies and financial products. Its potential is demonstrated in a machine learning case study for robust portfolio construction in a multi-asset universe: An explainable machine learning program links the synthetic matrixes to the portfolio volatility spread of hierarchical risk parity versus equal risk contribution. TOPICS: Statistical methods, big data/machine learning, portfolio construction, performance measurement Key Findings ▪ The authors introduce the matrix evolutions concept based on an evolutionary algorithm to simulate correlation matrixes useful for financial market applications. ▪ They apply the resulting synthetic correlation matrixes to benchmark hierarchical risk parity (HRP) and equal risk contribution allocations of a multi-asset futures portfolio and find HRP to show lower portfolio risk. ▪ The authors evaluate three competing machine learning methods to regress the portfolio risk spread between both allocation methods against statistical features of the synthetic correlation matrixes and then discuss the local and global feature importance using the SHAP framework by Lundberg and Lee (2017).","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122840835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Portfolio selection involves a trade-off between maximizing expected return and minimizing risk. In practice, useful formulations also include various costs and constraints that regularize the problem and reduce the risk due to estimation errors, resulting in solutions that depend on a number of hyperparameters. As the number of hyperparameters grows, selecting their value becomes increasingly important and difficult. In this article, the authors propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. They optimize hyperparameters on a train set to yield the best result subject to market-determined realized costs. In applications to single- and multiperiod portfolio selection, they show that sequential hyperparameter optimization finds solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters using fewer function evaluations. At the same time, the solutions found are more stable from in-sample training to out-of-sample testing, suggesting they are less likely to be extremities that randomly happened to yield good performance in training. TOPICS: Portfolio theory, portfolio construction, big data/machine learning Key Findings • The growing number of applications of machine-learning approaches to portfolio selection means that hyperparameter optimization becomes increasingly important. We propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. • We establish a connection between forecast uncertainty and holding- and trading-cost parameters in portfolio selection. We argue that they should be considered regularization parameters that can be adjusted in training to achieve optimal performance when tested subject to realized costs. • We show that multiobjective optimization can find solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters for portfolio selection. At the same time, the solutions are more stable across in-sample training and out-of-sample testing.
{"title":"Hyperparameter Optimization for Portfolio Selection","authors":"P. Nystrup, Erik Lindström, H. Madsen","doi":"10.3905/jfds.2020.1.035","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.035","url":null,"abstract":"Portfolio selection involves a trade-off between maximizing expected return and minimizing risk. In practice, useful formulations also include various costs and constraints that regularize the problem and reduce the risk due to estimation errors, resulting in solutions that depend on a number of hyperparameters. As the number of hyperparameters grows, selecting their value becomes increasingly important and difficult. In this article, the authors propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. They optimize hyperparameters on a train set to yield the best result subject to market-determined realized costs. In applications to single- and multiperiod portfolio selection, they show that sequential hyperparameter optimization finds solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters using fewer function evaluations. At the same time, the solutions found are more stable from in-sample training to out-of-sample testing, suggesting they are less likely to be extremities that randomly happened to yield good performance in training. TOPICS: Portfolio theory, portfolio construction, big data/machine learning Key Findings • The growing number of applications of machine-learning approaches to portfolio selection means that hyperparameter optimization becomes increasingly important. We propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. • We establish a connection between forecast uncertainty and holding- and trading-cost parameters in portfolio selection. We argue that they should be considered regularization parameters that can be adjusted in training to achieve optimal performance when tested subject to realized costs. • We show that multiobjective optimization can find solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters for portfolio selection. At the same time, the solutions are more stable across in-sample training and out-of-sample testing.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132841920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To what extent are financial market returns predictable? Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. The authors employ tree-based methods to overcome these limitations and attempt to approximate an upper bound for the predictability of returns in commodities futures markets. Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches. The findings hint at the importance multiway interactions and nonlinearities acquire in the data; they imply that new factors should be tested on their ability to add explanatory power to an ensemble of existing factors. TOPICS: Futures and forward contracts, commodities Key Findings • Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. • The authors employ tree-based methods to overcome these limitations and estimate the predictability of returns in commodities futures markets. • Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches.
{"title":"The Cross Section of Commodity Returns: A Nonparametric Approach","authors":"C. Struck, Enoch Cheng","doi":"10.3905/jfds.2020.1.034","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.034","url":null,"abstract":"To what extent are financial market returns predictable? Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. The authors employ tree-based methods to overcome these limitations and attempt to approximate an upper bound for the predictability of returns in commodities futures markets. Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches. The findings hint at the importance multiway interactions and nonlinearities acquire in the data; they imply that new factors should be tested on their ability to add explanatory power to an ensemble of existing factors. TOPICS: Futures and forward contracts, commodities Key Findings • Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. • The authors employ tree-based methods to overcome these limitations and estimate the predictability of returns in commodities futures markets. • Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors predict asset returns and measure risk premiums using a prominent technique from artificial intelligence: deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time-series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this article, the authors first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. They then perform a comparative analysis of these methods using data on US equities. They demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence and that long short-term memory–based models tend to have the best out-of-sample performance. TOPICS: Big data/machine learning, security analysis and valuation, performance measurement Key Findings ▪ This article provides a concise synopsis of deep sequence modeling with an emphasis on its historical development in the field of computer science and artificial intelligence. It serves as a reference source for social scientists who aim to use the tool to supplement conventional time-series and panel methods. ▪ Deep sequence models can be adapted successfully for asset pricing, especially in predicting asset returns, which allow the model to be flexible to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. In particular, the model’s ability to detect path-dependence patterns makes it versatile and effective, potentially outperforming existing models. ▪ This article provides a horse-race comparison of various deep sequence models for the tasks of forecasting returns and measuring risk premiums. Long short-term memory has the best performance in terms of out-of-sample predictive R2, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks.
{"title":"Deep Sequence Modeling: Development and Applications in Asset Pricing","authors":"Lingbo Cong, Ke Tang, Jingyuan Wang, Yang Zhang","doi":"10.3905/jfds.2020.1.053","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.053","url":null,"abstract":"The authors predict asset returns and measure risk premiums using a prominent technique from artificial intelligence: deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time-series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this article, the authors first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. They then perform a comparative analysis of these methods using data on US equities. They demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence and that long short-term memory–based models tend to have the best out-of-sample performance. TOPICS: Big data/machine learning, security analysis and valuation, performance measurement Key Findings ▪ This article provides a concise synopsis of deep sequence modeling with an emphasis on its historical development in the field of computer science and artificial intelligence. It serves as a reference source for social scientists who aim to use the tool to supplement conventional time-series and panel methods. ▪ Deep sequence models can be adapted successfully for asset pricing, especially in predicting asset returns, which allow the model to be flexible to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. In particular, the model’s ability to detect path-dependence patterns makes it versatile and effective, potentially outperforming existing models. ▪ This article provides a horse-race comparison of various deep sequence models for the tasks of forecasting returns and measuring risk premiums. Long short-term memory has the best performance in terms of out-of-sample predictive R2, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134487544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many financial applications, it is important to classify time-series data without any latency while maintaining persistence in the identified states. The authors propose a greedy online classifier that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising persistence. Their classifier is based on the idea of clustering temporal features while explicitly penalizing jumps between states by a fixed-cost regularization term that can be calibrated to achieve a desired level of persistence. Through a series of return simulations, the authors show that in most settings their new classifier remarkably obtains a higher accuracy than the correctly specified maximum likelihood estimator. They illustrate that the new classifier is more robust to misspecification and yields state sequences that are significantly more persistent both in and out of sample. They demonstrate how classification accuracy can be further improved by including features that are based on intraday data. Finally, the authors apply the new classifier to estimate persistent states of the S&P 500 Index. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • A new greedy online classifier is proposed that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising temporal persistence. • A series of simulations demonstrates that the new classifier frequently obtains a higher accuracy and is more robust to misspecification than the correctly specified maximum likelihood estimator. • Classification accuracy can be improved by including features that are based on intraday volatility data.
{"title":"Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features","authors":"P. Nystrup, Petter N. Kolm, Erik Lindström","doi":"10.2139/ssrn.3594875","DOIUrl":"https://doi.org/10.2139/ssrn.3594875","url":null,"abstract":"In many financial applications, it is important to classify time-series data without any latency while maintaining persistence in the identified states. The authors propose a greedy online classifier that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising persistence. Their classifier is based on the idea of clustering temporal features while explicitly penalizing jumps between states by a fixed-cost regularization term that can be calibrated to achieve a desired level of persistence. Through a series of return simulations, the authors show that in most settings their new classifier remarkably obtains a higher accuracy than the correctly specified maximum likelihood estimator. They illustrate that the new classifier is more robust to misspecification and yields state sequences that are significantly more persistent both in and out of sample. They demonstrate how classification accuracy can be further improved by including features that are based on intraday data. Finally, the authors apply the new classifier to estimate persistent states of the S&P 500 Index. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • A new greedy online classifier is proposed that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising temporal persistence. • A series of simulations demonstrates that the new classifier frequently obtains a higher accuracy and is more robust to misspecification than the correctly specified maximum likelihood estimator. • Classification accuracy can be improved by including features that are based on intraday volatility data.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"42 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123471248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The author proposes a committee approach to portfolio selection. Because each optimal portfolio is a combination of three basic elements—strategy, covariance matrix, and risk type—the author first augments the combination to 250 optimal portfolios at each estimation period. The author then defines a score to select the best portfolio to hold in the next period. Survival of the fittest, the superior performance of the combination portfolio, demonstrates that the committee approach to portfolio selection is not only effective but also easy to implement. TOPICS: Portfolio theory, portfolio construction Key Findings • This article proposes a flexible and easy-to-implement committee approach to portfolio selection. • This article defines an algorithm that proposes a score to select the best portfolio out of 250 augmented portfolios. • In survival of the fittest, evidence from several datasets shows that the resulting combination portfolio overcomes the distributional uncertainty and exhibits superior annualized performance.
{"title":"Portfolio Selection Using Portfolio Committees","authors":"Tsungwu Ho","doi":"10.2139/ssrn.3653595","DOIUrl":"https://doi.org/10.2139/ssrn.3653595","url":null,"abstract":"The author proposes a committee approach to portfolio selection. Because each optimal portfolio is a combination of three basic elements—strategy, covariance matrix, and risk type—the author first augments the combination to 250 optimal portfolios at each estimation period. The author then defines a score to select the best portfolio to hold in the next period. Survival of the fittest, the superior performance of the combination portfolio, demonstrates that the committee approach to portfolio selection is not only effective but also easy to implement. TOPICS: Portfolio theory, portfolio construction Key Findings • This article proposes a flexible and easy-to-implement committee approach to portfolio selection. • This article defines an algorithm that proposes a score to select the best portfolio out of 250 augmented portfolios. • In survival of the fittest, evidence from several datasets shows that the resulting combination portfolio overcomes the distributional uncertainty and exhibits superior annualized performance.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130406284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors study investor beliefs—sentiment and disagreement—about stock market returns during the COVID-19 pandemic using a large number of investor messages, about 3.7 million, on a social media investing platform, StockTwits. The rich and multimodal features of StockTwits data allow the authors to explore the evolution of sentiment and disagreement within and across investors, sectors, and even industries. The authors find that sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. Surprisingly, these measures have a sharp reversal toward the end of March. However, the performance of these measures across various sectors is heterogeneous. Financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively. TOPICS: Security analysis and valuation, quantitative methods, big data/machine learning, financial crises and financial market history, performance measurement Key Findings ▪ Daily time series of the sentiment and disagreement is not a stationary process. ▪ Sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. ▪ The financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively.
{"title":"Inside the Mind of Investors During the COVID-19 Pandemic: Evidence from the StockTwits Data","authors":"Hasan Fallahgoul","doi":"10.2139/ssrn.3583462","DOIUrl":"https://doi.org/10.2139/ssrn.3583462","url":null,"abstract":"The authors study investor beliefs—sentiment and disagreement—about stock market returns during the COVID-19 pandemic using a large number of investor messages, about 3.7 million, on a social media investing platform, StockTwits. The rich and multimodal features of StockTwits data allow the authors to explore the evolution of sentiment and disagreement within and across investors, sectors, and even industries. The authors find that sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. Surprisingly, these measures have a sharp reversal toward the end of March. However, the performance of these measures across various sectors is heterogeneous. Financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively. TOPICS: Security analysis and valuation, quantitative methods, big data/machine learning, financial crises and financial market history, performance measurement Key Findings ▪ Daily time series of the sentiment and disagreement is not a stationary process. ▪ Sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. ▪ The financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125009825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rise of big and alternative data has created significant new business opportunities in the financial sector. As we start on this journey of fast-moving technology disruption, financial professionals have a rare opportunity to balance the exponential growth of artificial intelligence (AI)/data science with ethics, bias, and privacy to create trusted data-driven decision making. In this article, the authors discuss the nuances of big data sets that are critical when one considers standards, processes, best practices, and modeling algorithms for the deployment of AI systems. In addition, this industry is widely guided by a fiduciary standard that puts the interests of the client above all else. It is therefore critical to have a thorough understanding of the limitations of our knowledge, because there are many known unknowns and unknown unknowns that can have a significant impact on outcomes. The authors emphasize key success factors for the deployment of AI initiatives: talent and bridging the skills gap. To achieve a lasting impact of big data initiatives, multidisciplinary teams with well-defined roles need to be established with continuing training and education. The prize is the finance of the future. TOPICS: Simulations, big data/machine learning Key Findings • The rise of alternative data in finance is creating major opportunities in all areas of the financial industry, including risk management, portfolio construction, investment banking, and insurance. • To build trusted outcomes in AI/ML initiatives, financial professionals’ roles are critical. Given the many nuances in using big data, there is a need for vetted protocols and methods in selecting data sets and algorithms. Best practices and guidelines are effective in reducing the risks of using AI/ML, including overfitting, lack of interpretability, biased inputs, and unethical use of data. • Given the major shortage of talent in AI/data science in finance, practical training of employees and continued education are keys to scale roll out to enable future of finance.
{"title":"It’s All About Data: How to Make Good Decisions in a World Awash with Information","authors":"Mehrzad Mahdavi, Hossein Kazemi","doi":"10.3905/jfds.2020.1.025","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.025","url":null,"abstract":"The rise of big and alternative data has created significant new business opportunities in the financial sector. As we start on this journey of fast-moving technology disruption, financial professionals have a rare opportunity to balance the exponential growth of artificial intelligence (AI)/data science with ethics, bias, and privacy to create trusted data-driven decision making. In this article, the authors discuss the nuances of big data sets that are critical when one considers standards, processes, best practices, and modeling algorithms for the deployment of AI systems. In addition, this industry is widely guided by a fiduciary standard that puts the interests of the client above all else. It is therefore critical to have a thorough understanding of the limitations of our knowledge, because there are many known unknowns and unknown unknowns that can have a significant impact on outcomes. The authors emphasize key success factors for the deployment of AI initiatives: talent and bridging the skills gap. To achieve a lasting impact of big data initiatives, multidisciplinary teams with well-defined roles need to be established with continuing training and education. The prize is the finance of the future. TOPICS: Simulations, big data/machine learning Key Findings • The rise of alternative data in finance is creating major opportunities in all areas of the financial industry, including risk management, portfolio construction, investment banking, and insurance. • To build trusted outcomes in AI/ML initiatives, financial professionals’ roles are critical. Given the many nuances in using big data, there is a need for vetted protocols and methods in selecting data sets and algorithms. Best practices and guidelines are effective in reducing the risks of using AI/ML, including overfitting, lack of interpretability, biased inputs, and unethical use of data. • Given the major shortage of talent in AI/data science in finance, practical training of employees and continued education are keys to scale roll out to enable future of finance.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou
Principal component analysis (PCA) is a useful tool when trying to construct factor models from historical asset returns. For the implied volatilities of US equities, there is a PCA-based model with a principal eigenportfolio whose return time series lies close to that of an overarching market factor. The authors show that this market factor is the index resulting from the daily compounding of a weighted average of implied-volatility returns, with weights based on the options’ open interest and Vega. The authors also analyze the singular vectors derived from the tensor structure of the implied volatilities of S&P 500 constituents and find evidence indicating that some type of open interest- and Vega-weighted index should be one of at least two significant factors in this market. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • Principal component analysis of a comprehensive dataset of implied volatility surfaces from options on US equities shows that their collective behavior is captured by just nine factors, whereas the effective spatial dimension of the residuals is closer to 500 than to the nominal dimension of 28,000, revealing the large redundancy in the data. • Portfolios of implied volatility surface returns, weighed suitably by open interest and Vega, track the principal eigenportfolio associated with a market portfolio of options, in analogy to equity portfolios. • Retention of the tensor structure in the eigenportfolio analysis improves the tracking between the open interest–Vega weighted (tensor) implied volatility surface returns portfolio and the (tensor) eigenportfolio, indicating that data structure matters.
{"title":"PCA for Implied Volatility Surfaces","authors":"M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou","doi":"10.3905/jfds.2020.1.032","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.032","url":null,"abstract":"Principal component analysis (PCA) is a useful tool when trying to construct factor models from historical asset returns. For the implied volatilities of US equities, there is a PCA-based model with a principal eigenportfolio whose return time series lies close to that of an overarching market factor. The authors show that this market factor is the index resulting from the daily compounding of a weighted average of implied-volatility returns, with weights based on the options’ open interest and Vega. The authors also analyze the singular vectors derived from the tensor structure of the implied volatilities of S&P 500 constituents and find evidence indicating that some type of open interest- and Vega-weighted index should be one of at least two significant factors in this market. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • Principal component analysis of a comprehensive dataset of implied volatility surfaces from options on US equities shows that their collective behavior is captured by just nine factors, whereas the effective spatial dimension of the residuals is closer to 500 than to the nominal dimension of 28,000, revealing the large redundancy in the data. • Portfolios of implied volatility surface returns, weighed suitably by open interest and Vega, track the principal eigenportfolio associated with a market portfolio of options, in analogy to equity portfolios. • Retention of the tensor structure in the eigenportfolio analysis improves the tracking between the open interest–Vega weighted (tensor) implied volatility surface returns portfolio and the (tensor) eigenportfolio, indicating that data structure matters.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-31DOI: 10.3905/jfds.2020.2.1.001
Francesco A. Fabozzi
robert dunn General Manager The four issues of the 2019 inaugural publication of The Journal of Financial Data Science by all metrics indicate the success of the journal. Four of the articles published in JFDS were in the top 10 most downloaded articles across the Portfolio Management Research (PMR) platform. This is quite an accomplishment considering that JFDS represented just one year of articles. After publication of the first issue, articles in JFDS were featured in an opinion piece on the challenges of implementing machine learning by David Stevenson (“Machine Learning Revolution is Still Some Way Off”) published in the Financial Times. One of the articles in the inaugural issue is highlighted by Bill Kelly, the CEO of the CAIA Association, in an August 2019 blog (“Whatfore Art Thou Use of Alt-Data?”) in AllAboutAlpha. The Financial Data Professional Institute (FDPI), established by the CAIA Association, will be adopting at least f ive articles from JFDS as required reading for their membership exams. As researchers in this space produce papers, our expectation is that the journal will be well cited. In the first issue of Volume 2, there are nine articles which are summarized below. “Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies” is the first in a series of articles by Derek Snow dealing with machine learning in asset management. The series will cover the applications to the major tasks of asset management: (1) portfolio construction, (2) risk management, (3) capital management, (4) infrastructure and deployment, and (5) sales and marketing. Portfolio construction is divided into trading and weight optimization. The primary focus of the current article is on how machine learning can be used to improve various types of trading strategies, while weight optimization is the subject of the next article in the series. Snow classifies trading strategies according to their respective machine-learning frameworks (i.e., reinforcement, supervised and unsupervised learning). He then explains the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. Global equity and bond asset management require techniques that also put effort into understanding the structure of the interactions. Network analysis offers asset managers insightful information regarding factor-based connectedness, relationships, and how risk is transferred between network components. Gueorgui Konstantinov and Mario Rusev demonstrate the relation between global equity and bond funds from a network perspective. In their article, “The Bond–Equity–Fund Relation Using the Fama–French–Carhart Factors: A Practical Network Approach,” they show the advantages of graph theory to explain the collective b y gu es t o n Fe br ua ry 5 , 2 02 1. C op yr ig ht 2 02 0 Pa ge an t M ed ia L td .
《金融数据科学杂志》(The Journal of Financial Data Science) 2019年创刊的四期从所有指标来看都表明了该杂志的成功。在JFDS上发表的四篇文章进入了整个投资组合管理研究(PMR)平台上下载次数最多的前10篇文章之列。考虑到JFDS只代表了一年的文章,这是一个相当大的成就。在第一期出版后,David Stevenson在金融时报上发表了一篇关于实现机器学习的挑战的评论文章(“机器学习革命仍有一段路要走”)。2019年8月,CAIA协会首席执行官Bill Kelly在AllAboutAlpha的一篇博客(“你为什么要使用另类数据?”)中强调了创刊号中的一篇文章。由CAIA协会成立的金融数据专业协会(FDPI)将采用JFDS的至少五篇文章作为其会员考试的必读材料。作为这个领域的研究人员发表论文,我们的期望是期刊会被很好地引用。在第2卷的第一期中,有九篇文章,总结如下。“资产管理中的机器学习-第1部分:投资组合构建-交易策略”是Derek Snow关于资产管理中的机器学习的系列文章中的第一篇。该系列将涵盖资产管理主要任务的应用程序:(1)投资组合构建,(2)风险管理,(3)资本管理,(4)基础设施和部署,以及(5)销售和营销。投资组合构建分为交易优化和权重优化。当前文章的主要焦点是如何使用机器学习来改进各种类型的交易策略,而权重优化是本系列下一篇文章的主题。Snow根据各自的机器学习框架(即强化学习、监督学习和无监督学习)对交易策略进行分类。然后,他解释了强化学习和监督学习之间的区别,包括概念上的区别以及它们各自的优缺点。全球股票和债券资产管理需要的技术也要努力理解相互作用的结构。网络分析为资产管理人员提供了关于基于因素的连通性、关系以及风险如何在网络组件之间转移的深刻信息。Gueorgui Konstantinov和Mario Rusev从网络的角度论证了全球股票基金和债券基金之间的关系。在他们的文章《使用Fama-French-Carhart因子的债券-股票-基金关系:一种实用的网络方法》中,他们展示了图论在解释集体投资方面的优势。2002年8月1日,我和我的朋友们一起去了洛杉矶。
{"title":"Managing Editor’s Letter","authors":"Francesco A. Fabozzi","doi":"10.3905/jfds.2020.2.1.001","DOIUrl":"https://doi.org/10.3905/jfds.2020.2.1.001","url":null,"abstract":"robert dunn General Manager The four issues of the 2019 inaugural publication of The Journal of Financial Data Science by all metrics indicate the success of the journal. Four of the articles published in JFDS were in the top 10 most downloaded articles across the Portfolio Management Research (PMR) platform. This is quite an accomplishment considering that JFDS represented just one year of articles. After publication of the first issue, articles in JFDS were featured in an opinion piece on the challenges of implementing machine learning by David Stevenson (“Machine Learning Revolution is Still Some Way Off”) published in the Financial Times. One of the articles in the inaugural issue is highlighted by Bill Kelly, the CEO of the CAIA Association, in an August 2019 blog (“Whatfore Art Thou Use of Alt-Data?”) in AllAboutAlpha. The Financial Data Professional Institute (FDPI), established by the CAIA Association, will be adopting at least f ive articles from JFDS as required reading for their membership exams. As researchers in this space produce papers, our expectation is that the journal will be well cited. In the first issue of Volume 2, there are nine articles which are summarized below. “Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies” is the first in a series of articles by Derek Snow dealing with machine learning in asset management. The series will cover the applications to the major tasks of asset management: (1) portfolio construction, (2) risk management, (3) capital management, (4) infrastructure and deployment, and (5) sales and marketing. Portfolio construction is divided into trading and weight optimization. The primary focus of the current article is on how machine learning can be used to improve various types of trading strategies, while weight optimization is the subject of the next article in the series. Snow classifies trading strategies according to their respective machine-learning frameworks (i.e., reinforcement, supervised and unsupervised learning). He then explains the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. Global equity and bond asset management require techniques that also put effort into understanding the structure of the interactions. Network analysis offers asset managers insightful information regarding factor-based connectedness, relationships, and how risk is transferred between network components. Gueorgui Konstantinov and Mario Rusev demonstrate the relation between global equity and bond funds from a network perspective. In their article, “The Bond–Equity–Fund Relation Using the Fama–French–Carhart Factors: A Practical Network Approach,” they show the advantages of graph theory to explain the collective b y gu es t o n Fe br ua ry 5 , 2 02 1. C op yr ig ht 2 02 0 Pa ge an t M ed ia L td .","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132549573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}