The Journal of Financial Data Science最新文献_第10页

Matrix Evolutions: Synthetic Correlations and Explainable Machine Learning for Constructing Robust Investment Portfolios 矩阵演化:构建稳健投资组合的综合相关性和可解释的机器学习

The Journal of Financial Data Science

Pub Date : 2020-07-29 DOI: 10.2139/ssrn.3663220

Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel

In this article, the authors present a novel and highly flexible concept to simulate correlation matrixes of financial markets. It produces realistic outcomes regarding stylized facts of empirical correlation matrixes and requires no asset return input data. The matrix generation is based on a multiobjective evolutionary algorithm, so the authors call the approach matrix evolutions. It is suitable for parallel implementation and can be accelerated by graphics processing units and quantum-inspired algorithms. The approach is useful for backtesting, pricing, and hedging correlation-dependent investment strategies and financial products. Its potential is demonstrated in a machine learning case study for robust portfolio construction in a multi-asset universe: An explainable machine learning program links the synthetic matrixes to the portfolio volatility spread of hierarchical risk parity versus equal risk contribution. TOPICS: Statistical methods, big data/machine learning, portfolio construction, performance measurement Key Findings ▪ The authors introduce the matrix evolutions concept based on an evolutionary algorithm to simulate correlation matrixes useful for financial market applications. ▪ They apply the resulting synthetic correlation matrixes to benchmark hierarchical risk parity (HRP) and equal risk contribution allocations of a multi-asset futures portfolio and find HRP to show lower portfolio risk. ▪ The authors evaluate three competing machine learning methods to regress the portfolio risk spread between both allocation methods against statistical features of the synthetic correlation matrixes and then discuss the local and global feature importance using the SHAP framework by Lundberg and Lee (2017).

在本文中，作者提出了一个新颖的、高度灵活的概念来模拟金融市场的相关矩阵。它产生关于经验相关矩阵的风格化事实的现实结果，并且不需要资产回报输入数据。由于矩阵生成是基于多目标进化算法，因此笔者将该方法称为矩阵进化。它适合并行实现，并且可以通过图形处理单元和量子启发算法来加速。该方法可用于回溯测试、定价和对冲相关性依赖的投资策略和金融产品。它的潜力在一个机器学习案例研究中得到了证明，该案例研究用于多资产领域的稳健投资组合构建:一个可解释的机器学习程序将合成矩阵与分层风险平价与等风险贡献的投资组合波动性传播联系起来。主题:统计方法，大数据/机器学习，投资组合构建，绩效评估。关键发现▪作者介绍了基于进化算法的矩阵进化概念，以模拟对金融市场应用有用的相关矩阵。▪他们将合成的相关矩阵应用于基准分层风险平价(HRP)和多资产期货投资组合的等风险贡献分配，并发现HRP显示较低的投资组合风险。▪作者评估了三种相互竞争的机器学习方法，以根据合成相关矩阵的统计特征回归两种分配方法之间的投资组合风险分布，然后使用Lundberg和Lee(2017)的SHAP框架讨论局部和全局特征的重要性。

{"title":"Matrix Evolutions: Synthetic Correlations and Explainable Machine Learning for Constructing Robust Investment Portfolios","authors":"Jochen Papenbrock, Peter Schwendner, Markus Jaeger, Stephan Krügel","doi":"10.2139/ssrn.3663220","DOIUrl":"https://doi.org/10.2139/ssrn.3663220","url":null,"abstract":"In this article, the authors present a novel and highly flexible concept to simulate correlation matrixes of financial markets. It produces realistic outcomes regarding stylized facts of empirical correlation matrixes and requires no asset return input data. The matrix generation is based on a multiobjective evolutionary algorithm, so the authors call the approach matrix evolutions. It is suitable for parallel implementation and can be accelerated by graphics processing units and quantum-inspired algorithms. The approach is useful for backtesting, pricing, and hedging correlation-dependent investment strategies and financial products. Its potential is demonstrated in a machine learning case study for robust portfolio construction in a multi-asset universe: An explainable machine learning program links the synthetic matrixes to the portfolio volatility spread of hierarchical risk parity versus equal risk contribution. TOPICS: Statistical methods, big data/machine learning, portfolio construction, performance measurement Key Findings ▪ The authors introduce the matrix evolutions concept based on an evolutionary algorithm to simulate correlation matrixes useful for financial market applications. ▪ They apply the resulting synthetic correlation matrixes to benchmark hierarchical risk parity (HRP) and equal risk contribution allocations of a multi-asset futures portfolio and find HRP to show lower portfolio risk. ▪ The authors evaluate three competing machine learning methods to regress the portfolio risk spread between both allocation methods against statistical features of the synthetic correlation matrixes and then discuss the local and global feature importance using the SHAP framework by Lundberg and Lee (2017).","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122840835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Hyperparameter Optimization for Portfolio Selection 投资组合选择的超参数优化

The Journal of Financial Data Science

Pub Date : 2020-06-18 DOI: 10.3905/jfds.2020.1.035

P. Nystrup, Erik Lindström, H. Madsen

Portfolio selection involves a trade-off between maximizing expected return and minimizing risk. In practice, useful formulations also include various costs and constraints that regularize the problem and reduce the risk due to estimation errors, resulting in solutions that depend on a number of hyperparameters. As the number of hyperparameters grows, selecting their value becomes increasingly important and difficult. In this article, the authors propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. They optimize hyperparameters on a train set to yield the best result subject to market-determined realized costs. In applications to single- and multiperiod portfolio selection, they show that sequential hyperparameter optimization finds solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters using fewer function evaluations. At the same time, the solutions found are more stable from in-sample training to out-of-sample testing, suggesting they are less likely to be extremities that randomly happened to yield good performance in training. TOPICS: Portfolio theory, portfolio construction, big data/machine learning Key Findings • The growing number of applications of machine-learning approaches to portfolio selection means that hyperparameter optimization becomes increasingly important. We propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. • We establish a connection between forecast uncertainty and holding- and trading-cost parameters in portfolio selection. We argue that they should be considered regularization parameters that can be adjusted in training to achieve optimal performance when tested subject to realized costs. • We show that multiobjective optimization can find solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters for portfolio selection. At the same time, the solutions are more stable across in-sample training and out-of-sample testing.

投资组合选择涉及到最大化预期收益和最小化风险之间的权衡。在实践中，有用的公式还包括各种成本和约束，这些成本和约束使问题规范化，并减少由于估计误差而导致的风险，从而产生依赖于许多超参数的解决方案。随着超参数数量的增加，选择它们的值变得越来越重要和困难。在本文中，作者通过利用自动化机器学习和多目标优化的最新进展，提出了一种系统的超参数优化方法。他们对列车集的超参数进行优化，以在市场决定的实现成本下产生最佳结果。在单期和多期投资组合选择的应用中，他们表明，序列超参数优化找到的解决方案具有更好的风险回报权衡，而不是使用更少的函数评估对超参数进行手动、网格和随机搜索。同时，从样本内训练到样本外测试，找到的解更加稳定，这表明它们不太可能是随机发生的在训练中产生良好性能的极端。•机器学习方法在投资组合选择中的应用越来越多，这意味着超参数优化变得越来越重要。我们通过利用自动化机器学习和多目标优化方面的最新进展，提出了一种系统的超参数优化方法。•我们在投资组合选择中建立了预测不确定性与持有和交易成本参数之间的联系。我们认为它们应该被视为正则化参数，可以在训练中进行调整，以便在测试时根据实现成本获得最佳性能。•我们表明，在投资组合选择的超参数上，与手动、网格和随机搜索相比，多目标优化可以找到具有更好风险回报权衡的解决方案。同时，在样本内训练和样本外测试中，解更加稳定。

{"title":"Hyperparameter Optimization for Portfolio Selection","authors":"P. Nystrup, Erik Lindström, H. Madsen","doi":"10.3905/jfds.2020.1.035","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.035","url":null,"abstract":"Portfolio selection involves a trade-off between maximizing expected return and minimizing risk. In practice, useful formulations also include various costs and constraints that regularize the problem and reduce the risk due to estimation errors, resulting in solutions that depend on a number of hyperparameters. As the number of hyperparameters grows, selecting their value becomes increasingly important and difficult. In this article, the authors propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. They optimize hyperparameters on a train set to yield the best result subject to market-determined realized costs. In applications to single- and multiperiod portfolio selection, they show that sequential hyperparameter optimization finds solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters using fewer function evaluations. At the same time, the solutions found are more stable from in-sample training to out-of-sample testing, suggesting they are less likely to be extremities that randomly happened to yield good performance in training. TOPICS: Portfolio theory, portfolio construction, big data/machine learning Key Findings • The growing number of applications of machine-learning approaches to portfolio selection means that hyperparameter optimization becomes increasingly important. We propose a systematic approach to hyperparameter optimization by leveraging recent advances in automated machine learning and multiobjective optimization. • We establish a connection between forecast uncertainty and holding- and trading-cost parameters in portfolio selection. We argue that they should be considered regularization parameters that can be adjusted in training to achieve optimal performance when tested subject to realized costs. • We show that multiobjective optimization can find solutions with better risk–return trade-offs than manual, grid, and random search over hyperparameters for portfolio selection. At the same time, the solutions are more stable across in-sample training and out-of-sample testing.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132841920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

The Cross Section of Commodity Returns: A Nonparametric Approach 商品收益的横截面:一个非参数方法

The Journal of Financial Data Science

Pub Date : 2020-06-17 DOI: 10.3905/jfds.2020.1.034

C. Struck, Enoch Cheng

To what extent are financial market returns predictable? Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. The authors employ tree-based methods to overcome these limitations and attempt to approximate an upper bound for the predictability of returns in commodities futures markets. Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches. The findings hint at the importance multiway interactions and nonlinearities acquire in the data; they imply that new factors should be tested on their ability to add explanatory power to an ensemble of existing factors. TOPICS: Futures and forward contracts, commodities Key Findings • Standard approaches to asset pricing make strong parametric assumptions that undermine their return-predicting ability. • The authors employ tree-based methods to overcome these limitations and estimate the predictability of returns in commodities futures markets. • Out of sample, they find that up to 3.74% of 1-month returns are predictable—more than a 10-fold increase from standard approaches.

金融市场回报在多大程度上是可预测的?资产定价的标准方法做出了强有力的参数假设，这削弱了它们预测收益的能力。作者采用基于树的方法来克服这些限制，并试图近似商品期货市场回报可预测性的上限。在样本之外，他们发现高达3.74%的1个月回报是可预测的——比标准方法增加了10倍以上。这些发现暗示了数据中多重相互作用和非线性的重要性;他们暗示，应该测试新因素是否有能力为现有因素的集合增加解释力。主题:期货和远期合约，大宗商品主要发现•资产定价的标准方法做出了强有力的参数假设，破坏了它们的回报预测能力。•作者采用基于树的方法来克服这些限制，并估计商品期货市场回报的可预测性。•在样本之外，他们发现高达3.74%的1个月回报是可预测的，比标准方法增加了10倍以上。

引用次数: 3

Deep Sequence Modeling: Development and Applications in Asset Pricing 深度序列建模:在资产定价中的发展与应用

The Journal of Financial Data Science

Pub Date : 2020-06-01 DOI: 10.3905/jfds.2020.1.053

Lingbo Cong, Ke Tang, Jingyuan Wang, Yang Zhang

The authors predict asset returns and measure risk premiums using a prominent technique from artificial intelligence: deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time-series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this article, the authors first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. They then perform a comparative analysis of these methods using data on US equities. They demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence and that long short-term memory–based models tend to have the best out-of-sample performance. TOPICS: Big data/machine learning, security analysis and valuation, performance measurement Key Findings ▪ This article provides a concise synopsis of deep sequence modeling with an emphasis on its historical development in the field of computer science and artificial intelligence. It serves as a reference source for social scientists who aim to use the tool to supplement conventional time-series and panel methods. ▪ Deep sequence models can be adapted successfully for asset pricing, especially in predicting asset returns, which allow the model to be flexible to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. In particular, the model’s ability to detect path-dependence patterns makes it versatile and effective, potentially outperforming existing models. ▪ This article provides a horse-race comparison of various deep sequence models for the tasks of forecasting returns and measuring risk premiums. Long short-term memory has the best performance in terms of out-of-sample predictive R2, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks.

作者使用人工智能的一项重要技术:深度序列建模来预测资产回报和衡量风险溢价。由于资产回报通常表现出顺序依赖性，而传统的时间序列模型可能无法有效地捕捉到这一点，因此序列建模以其数据驱动的方法和优越的性能提供了一条有前途的路径。本文首先概述了深度序列模型的发展，介绍了它们在资产定价中的应用，并讨论了它们的优点和局限性。然后，他们利用美国股市的数据对这些方法进行比较分析。他们展示了序列建模如何通过结合复杂的历史路径依赖而使投资者受益，以及基于长短期记忆的模型往往具有最佳的样本外性能。主题:大数据/机器学习，安全分析和评估，性能测量。本文简要介绍了深度序列建模，重点介绍了其在计算机科学和人工智能领域的历史发展。它为旨在使用该工具补充传统时间序列和小组方法的社会科学家提供了参考资料。▪深度序列模型可以成功地用于资产定价，特别是在预测资产回报方面，这使得模型能够灵活地捕捉金融数据的高维、非线性、交互式、低信噪比和动态性。特别是，该模型检测路径依赖模式的能力使其具有通用性和有效性，潜在地优于现有模型。▪本文为预测收益和衡量风险溢价的任务提供了各种深度序列模型的赛马比较。在样本外预测R2方面，长短期记忆表现最佳，在排除小盘股的情况下，具有注意机制的长短期记忆表现最佳。

{"title":"Deep Sequence Modeling: Development and Applications in Asset Pricing","authors":"Lingbo Cong, Ke Tang, Jingyuan Wang, Yang Zhang","doi":"10.3905/jfds.2020.1.053","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.053","url":null,"abstract":"The authors predict asset returns and measure risk premiums using a prominent technique from artificial intelligence: deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time-series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this article, the authors first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. They then perform a comparative analysis of these methods using data on US equities. They demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence and that long short-term memory–based models tend to have the best out-of-sample performance. TOPICS: Big data/machine learning, security analysis and valuation, performance measurement Key Findings ▪ This article provides a concise synopsis of deep sequence modeling with an emphasis on its historical development in the field of computer science and artificial intelligence. It serves as a reference source for social scientists who aim to use the tool to supplement conventional time-series and panel methods. ▪ Deep sequence models can be adapted successfully for asset pricing, especially in predicting asset returns, which allow the model to be flexible to capture the high-dimensionality, nonlinear, interactive, low signal-to-noise, and dynamic nature of financial data. In particular, the model’s ability to detect path-dependence patterns makes it versatile and effective, potentially outperforming existing models. ▪ This article provides a horse-race comparison of various deep sequence models for the tasks of forecasting returns and measuring risk premiums. Long short-term memory has the best performance in terms of out-of-sample predictive R2, and long short-term memory with an attention mechanism has the best portfolio performance when excluding microcap stocks.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134487544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features 利用已实现的日内波动特征对持续市场状态进行贪婪在线分类

The Journal of Financial Data Science

Pub Date : 2020-05-06 DOI: 10.2139/ssrn.3594875

P. Nystrup, Petter N. Kolm, Erik Lindström

In many financial applications, it is important to classify time-series data without any latency while maintaining persistence in the identified states. The authors propose a greedy online classifier that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising persistence. Their classifier is based on the idea of clustering temporal features while explicitly penalizing jumps between states by a fixed-cost regularization term that can be calibrated to achieve a desired level of persistence. Through a series of return simulations, the authors show that in most settings their new classifier remarkably obtains a higher accuracy than the correctly specified maximum likelihood estimator. They illustrate that the new classifier is more robust to misspecification and yields state sequences that are significantly more persistent both in and out of sample. They demonstrate how classification accuracy can be further improved by including features that are based on intraday data. Finally, the authors apply the new classifier to estimate persistent states of the S&P 500 Index. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • A new greedy online classifier is proposed that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising temporal persistence. • A series of simulations demonstrates that the new classifier frequently obtains a higher accuracy and is more robust to misspecification than the correctly specified maximum likelihood estimator. • Classification accuracy can be improved by including features that are based on intraday volatility data.

在许多金融应用程序中，对时间序列数据进行无延迟分类，同时保持已识别状态的持久性是很重要的。作者提出了一种贪婪的在线分类器，它可以同时确定一个新的观察值属于哪个隐藏状态，而不需要解析历史观察值，也不影响持久性。他们的分类器基于聚类时间特征的思想，同时通过固定成本的正则化项显式地惩罚状态之间的跳跃，该正则化项可以校准以达到所需的持久性水平。通过一系列的返回模拟，作者表明，在大多数情况下，他们的新分类器明显比正确指定的最大似然估计器获得更高的精度。他们表明，新的分类器对错误规范的鲁棒性更强，并且产生的状态序列在样本内外都具有更强的持久性。他们展示了如何通过包含基于即日数据的特征来进一步提高分类精度。最后，作者应用新的分类器来估计标准普尔500指数的持续状态。•提出了一种新的贪婪在线分类器，该分类器可以同时确定新观测值属于哪种隐藏状态，而无需解析历史观测值，也不会影响时间持久性。•一系列的仿真表明，与正确指定的最大似然估计器相比，新的分类器经常获得更高的精度，并且对错误规范的鲁棒性更强。•通过包含基于日内波动率数据的特征，可以提高分类准确性。

{"title":"Greedy Online Classification of Persistent Market States Using Realized Intraday Volatility Features","authors":"P. Nystrup, Petter N. Kolm, Erik Lindström","doi":"10.2139/ssrn.3594875","DOIUrl":"https://doi.org/10.2139/ssrn.3594875","url":null,"abstract":"In many financial applications, it is important to classify time-series data without any latency while maintaining persistence in the identified states. The authors propose a greedy online classifier that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising persistence. Their classifier is based on the idea of clustering temporal features while explicitly penalizing jumps between states by a fixed-cost regularization term that can be calibrated to achieve a desired level of persistence. Through a series of return simulations, the authors show that in most settings their new classifier remarkably obtains a higher accuracy than the correctly specified maximum likelihood estimator. They illustrate that the new classifier is more robust to misspecification and yields state sequences that are significantly more persistent both in and out of sample. They demonstrate how classification accuracy can be further improved by including features that are based on intraday data. Finally, the authors apply the new classifier to estimate persistent states of the S&P 500 Index. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • A new greedy online classifier is proposed that contemporaneously determines which hidden state a new observation belongs to without the need to parse historical observations and without compromising temporal persistence. • A series of simulations demonstrates that the new classifier frequently obtains a higher accuracy and is more robust to misspecification than the correctly specified maximum likelihood estimator. • Classification accuracy can be improved by including features that are based on intraday volatility data.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"42 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123471248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Portfolio Selection Using Portfolio Committees 利用投资组合委员会进行投资组合选择

The Journal of Financial Data Science

Pub Date : 2020-05-01 DOI: 10.2139/ssrn.3653595

Tsungwu Ho

The author proposes a committee approach to portfolio selection. Because each optimal portfolio is a combination of three basic elements—strategy, covariance matrix, and risk type—the author first augments the combination to 250 optimal portfolios at each estimation period. The author then defines a score to select the best portfolio to hold in the next period. Survival of the fittest, the superior performance of the combination portfolio, demonstrates that the committee approach to portfolio selection is not only effective but also easy to implement. TOPICS: Portfolio theory, portfolio construction Key Findings • This article proposes a flexible and easy-to-implement committee approach to portfolio selection. • This article defines an algorithm that proposes a score to select the best portfolio out of 250 augmented portfolios. • In survival of the fittest, evidence from several datasets shows that the resulting combination portfolio overcomes the distributional uncertainty and exhibits superior annualized performance.

作者提出了一个委员会的方法来选择投资组合。由于每个最优投资组合是策略、协方差矩阵和风险类型三个基本要素的组合，因此作者首先在每个估计期间将组合增加到250个最优投资组合。然后，作者定义了一个分数，以选择在下一时期持有的最佳投资组合。优胜劣汰，组合投资组合的优越表现，证明了委员会投资组合方法不仅有效而且易于实施。•本文提出了一种灵活且易于实施的委员会投资组合选择方法。•本文定义了一种算法，该算法提出了一个分数，以选择250个增强投资组合中的最佳投资组合。•在适者生存中，来自多个数据集的证据表明，由此产生的组合投资组合克服了分布的不确定性，并表现出卓越的年化表现。

引用次数: 0

Inside the Mind of Investors During the COVID-19 Pandemic: Evidence from the StockTwits Data COVID-19大流行期间投资者的内心:来自StockTwits数据的证据

The Journal of Financial Data Science

Pub Date : 2020-04-23 DOI: 10.2139/ssrn.3583462

Hasan Fallahgoul

The authors study investor beliefs—sentiment and disagreement—about stock market returns during the COVID-19 pandemic using a large number of investor messages, about 3.7 million, on a social media investing platform, StockTwits. The rich and multimodal features of StockTwits data allow the authors to explore the evolution of sentiment and disagreement within and across investors, sectors, and even industries. The authors find that sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. Surprisingly, these measures have a sharp reversal toward the end of March. However, the performance of these measures across various sectors is heterogeneous. Financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively. TOPICS: Security analysis and valuation, quantitative methods, big data/machine learning, financial crises and financial market history, performance measurement Key Findings ▪ Daily time series of the sentiment and disagreement is not a stationary process. ▪ Sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. ▪ The financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively.

作者利用社交媒体投资平台StockTwits上的大量投资者信息(约370万条)，研究了投资者在COVID-19大流行期间对股市回报的看法——情绪和分歧。StockTwits数据的丰富和多模式特征使作者能够探索投资者、部门甚至行业内部和行业之间的情绪和分歧的演变。作者发现，在2020年2月19日至2020年3月23日期间，所有具有任何投资理念、视野和经验的投资者的情绪(分歧)都急剧下降(上升)，当时市场达到历史高点，随后出现创纪录的下跌。令人惊讶的是，这些指标在接近3月底时出现了急剧逆转。然而，这些措施在不同部门的表现是不同的。金融和医疗行业分别是最悲观和最乐观的行业。主题:证券分析和估值，定量方法，大数据/机器学习，金融危机和金融市场历史，绩效衡量。主要发现▪情绪和分歧的每日时间序列不是一个平稳的过程。•在2020年2月19日至2020年3月23日期间，所有具有任何投资理念、视野和经验的投资者的情绪(分歧)急剧下降(上升)，当时市场达到历史高点，随后出现创纪录的下跌。▪金融和保健部门分别是最悲观和最乐观的部门。

{"title":"Inside the Mind of Investors During the COVID-19 Pandemic: Evidence from the StockTwits Data","authors":"Hasan Fallahgoul","doi":"10.2139/ssrn.3583462","DOIUrl":"https://doi.org/10.2139/ssrn.3583462","url":null,"abstract":"The authors study investor beliefs—sentiment and disagreement—about stock market returns during the COVID-19 pandemic using a large number of investor messages, about 3.7 million, on a social media investing platform, StockTwits. The rich and multimodal features of StockTwits data allow the authors to explore the evolution of sentiment and disagreement within and across investors, sectors, and even industries. The authors find that sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. Surprisingly, these measures have a sharp reversal toward the end of March. However, the performance of these measures across various sectors is heterogeneous. Financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively. TOPICS: Security analysis and valuation, quantitative methods, big data/machine learning, financial crises and financial market history, performance measurement Key Findings ▪ Daily time series of the sentiment and disagreement is not a stationary process. ▪ Sentiment (disagreement) has a sharp decrease (increase) across all investors with any investment philosophy, horizon, and experience between February 19, 2020, and March 23, 2020, when a historical market high was followed by a record drop. ▪ The financial and healthcare sectors are the most pessimistic and optimistic divisions, respectively.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125009825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

It’s All About Data: How to Make Good Decisions in a World Awash with Information 《一切都与数据有关:如何在信息泛滥的世界中做出正确的决定

The Journal of Financial Data Science

Pub Date : 2020-03-04 DOI: 10.3905/jfds.2020.1.025

Mehrzad Mahdavi, Hossein Kazemi

The rise of big and alternative data has created significant new business opportunities in the financial sector. As we start on this journey of fast-moving technology disruption, financial professionals have a rare opportunity to balance the exponential growth of artificial intelligence (AI)/data science with ethics, bias, and privacy to create trusted data-driven decision making. In this article, the authors discuss the nuances of big data sets that are critical when one considers standards, processes, best practices, and modeling algorithms for the deployment of AI systems. In addition, this industry is widely guided by a fiduciary standard that puts the interests of the client above all else. It is therefore critical to have a thorough understanding of the limitations of our knowledge, because there are many known unknowns and unknown unknowns that can have a significant impact on outcomes. The authors emphasize key success factors for the deployment of AI initiatives: talent and bridging the skills gap. To achieve a lasting impact of big data initiatives, multidisciplinary teams with well-defined roles need to be established with continuing training and education. The prize is the finance of the future. TOPICS: Simulations, big data/machine learning Key Findings • The rise of alternative data in finance is creating major opportunities in all areas of the financial industry, including risk management, portfolio construction, investment banking, and insurance. • To build trusted outcomes in AI/ML initiatives, financial professionals’ roles are critical. Given the many nuances in using big data, there is a need for vetted protocols and methods in selecting data sets and algorithms. Best practices and guidelines are effective in reducing the risks of using AI/ML, including overfitting, lack of interpretability, biased inputs, and unethical use of data. • Given the major shortage of talent in AI/data science in finance, practical training of employees and continued education are keys to scale roll out to enable future of finance.

大数据和替代数据的兴起为金融领域创造了重要的新商机。随着我们开始快速发展的技术颠覆之旅，金融专业人士有一个难得的机会来平衡人工智能(AI)/数据科学的指数级增长与道德、偏见和隐私，以创建可信的数据驱动决策。在本文中，作者讨论了大数据集的细微差别，当人们考虑人工智能系统部署的标准、流程、最佳实践和建模算法时，这些差别是至关重要的。此外，该行业普遍遵循将客户利益置于一切之上的信托标准。因此，彻底了解我们知识的局限性是至关重要的，因为有许多已知的未知和未知的未知会对结果产生重大影响。作者强调了部署人工智能计划的关键成功因素:人才和弥合技能差距。为了实现大数据计划的持久影响，需要建立具有明确角色的多学科团队，并进行持续培训和教育。奖品是未来的金融。•金融领域替代数据的兴起正在金融行业的各个领域创造重大机遇，包括风险管理、投资组合构建、投资银行和保险。•为了在AI/ML计划中建立可信的结果，金融专业人士的角色至关重要。考虑到使用大数据的许多细微差别，在选择数据集和算法时需要经过审查的协议和方法。最佳实践和指导方针可以有效降低使用AI/ML的风险，包括过度拟合、缺乏可解释性、有偏见的输入和不道德的数据使用。•鉴于金融领域人工智能/数据科学人才的严重短缺，员工的实践培训和继续教育是扩大规模以实现金融未来的关键。

{"title":"It’s All About Data: How to Make Good Decisions in a World Awash with Information","authors":"Mehrzad Mahdavi, Hossein Kazemi","doi":"10.3905/jfds.2020.1.025","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.025","url":null,"abstract":"The rise of big and alternative data has created significant new business opportunities in the financial sector. As we start on this journey of fast-moving technology disruption, financial professionals have a rare opportunity to balance the exponential growth of artificial intelligence (AI)/data science with ethics, bias, and privacy to create trusted data-driven decision making. In this article, the authors discuss the nuances of big data sets that are critical when one considers standards, processes, best practices, and modeling algorithms for the deployment of AI systems. In addition, this industry is widely guided by a fiduciary standard that puts the interests of the client above all else. It is therefore critical to have a thorough understanding of the limitations of our knowledge, because there are many known unknowns and unknown unknowns that can have a significant impact on outcomes. The authors emphasize key success factors for the deployment of AI initiatives: talent and bridging the skills gap. To achieve a lasting impact of big data initiatives, multidisciplinary teams with well-defined roles need to be established with continuing training and education. The prize is the finance of the future. TOPICS: Simulations, big data/machine learning Key Findings • The rise of alternative data in finance is creating major opportunities in all areas of the financial industry, including risk management, portfolio construction, investment banking, and insurance. • To build trusted outcomes in AI/ML initiatives, financial professionals’ roles are critical. Given the many nuances in using big data, there is a need for vetted protocols and methods in selecting data sets and algorithms. Best practices and guidelines are effective in reducing the risks of using AI/ML, including overfitting, lack of interpretability, biased inputs, and unethical use of data. • Given the major shortage of talent in AI/data science in finance, practical training of employees and continued education are keys to scale roll out to enable future of finance.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PCA for Implied Volatility Surfaces 隐含波动率曲面的PCA

The Journal of Financial Data Science

Pub Date : 2020-01-31 DOI: 10.3905/jfds.2020.1.032

M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou

Principal component analysis (PCA) is a useful tool when trying to construct factor models from historical asset returns. For the implied volatilities of US equities, there is a PCA-based model with a principal eigenportfolio whose return time series lies close to that of an overarching market factor. The authors show that this market factor is the index resulting from the daily compounding of a weighted average of implied-volatility returns, with weights based on the options’ open interest and Vega. The authors also analyze the singular vectors derived from the tensor structure of the implied volatilities of S&P 500 constituents and find evidence indicating that some type of open interest- and Vega-weighted index should be one of at least two significant factors in this market. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • Principal component analysis of a comprehensive dataset of implied volatility surfaces from options on US equities shows that their collective behavior is captured by just nine factors, whereas the effective spatial dimension of the residuals is closer to 500 than to the nominal dimension of 28,000, revealing the large redundancy in the data. • Portfolios of implied volatility surface returns, weighed suitably by open interest and Vega, track the principal eigenportfolio associated with a market portfolio of options, in analogy to equity portfolios. • Retention of the tensor structure in the eigenportfolio analysis improves the tracking between the open interest–Vega weighted (tensor) implied volatility surface returns portfolio and the (tensor) eigenportfolio, indicating that data structure matters.

当试图从历史资产收益中构建因子模型时，主成分分析(PCA)是一个有用的工具。对于美国股票的隐含波动率，有一个基于pca的模型，其主要特征投资组合的回报时间序列接近于总体市场因素的回报时间序列。作者表明，该市场因子是隐含波动率收益的加权平均值的每日复合指数，其权重基于期权的未平仓量和Vega。作者还分析了由标准普尔500指数成分股隐含波动率的张量结构导出的奇异向量，并找到证据表明，某种类型的未平仓合约和维加加权指数应该是这个市场中至少两个重要因素之一。•对美国股票期权隐含波动面综合数据集的主成分分析表明，它们的集体行为仅由9个因素捕获，而残差的有效空间维度更接近500，而不是名义维度28,000，揭示了数据中的大量冗余。•隐含波动率表面回报的投资组合，通过未平仓权益和Vega适当加权，跟踪与期权市场投资组合相关的主要特征投资组合，类似于股票投资组合。•在特征组合分析中保留张量结构，改善了未平仓利率-维加加权(张量)隐含波动率表面回报组合与(张量)特征组合之间的跟踪，表明数据结构很重要。

{"title":"PCA for Implied Volatility Surfaces","authors":"M. Avellaneda, Brian F. Healy, A. Papanicolaou, G. Papanicolaou","doi":"10.3905/jfds.2020.1.032","DOIUrl":"https://doi.org/10.3905/jfds.2020.1.032","url":null,"abstract":"Principal component analysis (PCA) is a useful tool when trying to construct factor models from historical asset returns. For the implied volatilities of US equities, there is a PCA-based model with a principal eigenportfolio whose return time series lies close to that of an overarching market factor. The authors show that this market factor is the index resulting from the daily compounding of a weighted average of implied-volatility returns, with weights based on the options’ open interest and Vega. The authors also analyze the singular vectors derived from the tensor structure of the implied volatilities of S&P 500 constituents and find evidence indicating that some type of open interest- and Vega-weighted index should be one of at least two significant factors in this market. TOPICS: Statistical methods, simulations, big data/machine learning Key Findings • Principal component analysis of a comprehensive dataset of implied volatility surfaces from options on US equities shows that their collective behavior is captured by just nine factors, whereas the effective spatial dimension of the residuals is closer to 500 than to the nominal dimension of 28,000, revealing the large redundancy in the data. • Portfolios of implied volatility surface returns, weighed suitably by open interest and Vega, track the principal eigenportfolio associated with a market portfolio of options, in analogy to equity portfolios. • Retention of the tensor structure in the eigenportfolio analysis improves the tracking between the open interest–Vega weighted (tensor) implied volatility surface returns portfolio and the (tensor) eigenportfolio, indicating that data structure matters.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Managing Editor’s Letter 总编辑的信

The Journal of Financial Data Science

Pub Date : 2020-01-31 DOI: 10.3905/jfds.2020.2.1.001

Francesco A. Fabozzi

robert dunn General Manager The four issues of the 2019 inaugural publication of The Journal of Financial Data Science by all metrics indicate the success of the journal. Four of the articles published in JFDS were in the top 10 most downloaded articles across the Portfolio Management Research (PMR) platform. This is quite an accomplishment considering that JFDS represented just one year of articles. After publication of the first issue, articles in JFDS were featured in an opinion piece on the challenges of implementing machine learning by David Stevenson (“Machine Learning Revolution is Still Some Way Off”) published in the Financial Times. One of the articles in the inaugural issue is highlighted by Bill Kelly, the CEO of the CAIA Association, in an August 2019 blog (“Whatfore Art Thou Use of Alt-Data?”) in AllAboutAlpha. The Financial Data Professional Institute (FDPI), established by the CAIA Association, will be adopting at least f ive articles from JFDS as required reading for their membership exams. As researchers in this space produce papers, our expectation is that the journal will be well cited. In the first issue of Volume 2, there are nine articles which are summarized below. “Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies” is the first in a series of articles by Derek Snow dealing with machine learning in asset management. The series will cover the applications to the major tasks of asset management: (1) portfolio construction, (2) risk management, (3) capital management, (4) infrastructure and deployment, and (5) sales and marketing. Portfolio construction is divided into trading and weight optimization. The primary focus of the current article is on how machine learning can be used to improve various types of trading strategies, while weight optimization is the subject of the next article in the series. Snow classifies trading strategies according to their respective machine-learning frameworks (i.e., reinforcement, supervised and unsupervised learning). He then explains the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. Global equity and bond asset management require techniques that also put effort into understanding the structure of the interactions. Network analysis offers asset managers insightful information regarding factor-based connectedness, relationships, and how risk is transferred between network components. Gueorgui Konstantinov and Mario Rusev demonstrate the relation between global equity and bond funds from a network perspective. In their article, “The Bond–Equity–Fund Relation Using the Fama–French–Carhart Factors: A Practical Network Approach,” they show the advantages of graph theory to explain the collective b y gu es t o n Fe br ua ry 5 , 2 02 1. C op yr ig ht 2 02 0 Pa ge an t M ed ia L td .

《金融数据科学杂志》(The Journal of Financial Data Science) 2019年创刊的四期从所有指标来看都表明了该杂志的成功。在JFDS上发表的四篇文章进入了整个投资组合管理研究(PMR)平台上下载次数最多的前10篇文章之列。考虑到JFDS只代表了一年的文章，这是一个相当大的成就。在第一期出版后，David Stevenson在金融时报上发表了一篇关于实现机器学习的挑战的评论文章(“机器学习革命仍有一段路要走”)。2019年8月，CAIA协会首席执行官Bill Kelly在AllAboutAlpha的一篇博客(“你为什么要使用另类数据?”)中强调了创刊号中的一篇文章。由CAIA协会成立的金融数据专业协会(FDPI)将采用JFDS的至少五篇文章作为其会员考试的必读材料。作为这个领域的研究人员发表论文，我们的期望是期刊会被很好地引用。在第2卷的第一期中，有九篇文章，总结如下。“资产管理中的机器学习-第1部分:投资组合构建-交易策略”是Derek Snow关于资产管理中的机器学习的系列文章中的第一篇。该系列将涵盖资产管理主要任务的应用程序:(1)投资组合构建，(2)风险管理，(3)资本管理，(4)基础设施和部署，以及(5)销售和营销。投资组合构建分为交易优化和权重优化。当前文章的主要焦点是如何使用机器学习来改进各种类型的交易策略，而权重优化是本系列下一篇文章的主题。Snow根据各自的机器学习框架(即强化学习、监督学习和无监督学习)对交易策略进行分类。然后，他解释了强化学习和监督学习之间的区别，包括概念上的区别以及它们各自的优缺点。全球股票和债券资产管理需要的技术也要努力理解相互作用的结构。网络分析为资产管理人员提供了关于基于因素的连通性、关系以及风险如何在网络组件之间转移的深刻信息。Gueorgui Konstantinov和Mario Rusev从网络的角度论证了全球股票基金和债券基金之间的关系。在他们的文章《使用Fama-French-Carhart因子的债券-股票-基金关系:一种实用的网络方法》中，他们展示了图论在解释集体投资方面的优势。2002年8月1日，我和我的朋友们一起去了洛杉矶。

{"title":"Managing Editor’s Letter","authors":"Francesco A. Fabozzi","doi":"10.3905/jfds.2020.2.1.001","DOIUrl":"https://doi.org/10.3905/jfds.2020.2.1.001","url":null,"abstract":"robert dunn General Manager The four issues of the 2019 inaugural publication of The Journal of Financial Data Science by all metrics indicate the success of the journal. Four of the articles published in JFDS were in the top 10 most downloaded articles across the Portfolio Management Research (PMR) platform. This is quite an accomplishment considering that JFDS represented just one year of articles. After publication of the first issue, articles in JFDS were featured in an opinion piece on the challenges of implementing machine learning by David Stevenson (“Machine Learning Revolution is Still Some Way Off”) published in the Financial Times. One of the articles in the inaugural issue is highlighted by Bill Kelly, the CEO of the CAIA Association, in an August 2019 blog (“Whatfore Art Thou Use of Alt-Data?”) in AllAboutAlpha. The Financial Data Professional Institute (FDPI), established by the CAIA Association, will be adopting at least f ive articles from JFDS as required reading for their membership exams. As researchers in this space produce papers, our expectation is that the journal will be well cited. In the first issue of Volume 2, there are nine articles which are summarized below. “Machine Learning in Asset Management—Part 1: Portfolio Construction—Trading Strategies” is the first in a series of articles by Derek Snow dealing with machine learning in asset management. The series will cover the applications to the major tasks of asset management: (1) portfolio construction, (2) risk management, (3) capital management, (4) infrastructure and deployment, and (5) sales and marketing. Portfolio construction is divided into trading and weight optimization. The primary focus of the current article is on how machine learning can be used to improve various types of trading strategies, while weight optimization is the subject of the next article in the series. Snow classifies trading strategies according to their respective machine-learning frameworks (i.e., reinforcement, supervised and unsupervised learning). He then explains the difference between reinforcement learning and supervised learning, both conceptually and in relation to their respective advantages and disadvantages. Global equity and bond asset management require techniques that also put effort into understanding the structure of the interactions. Network analysis offers asset managers insightful information regarding factor-based connectedness, relationships, and how risk is transferred between network components. Gueorgui Konstantinov and Mario Rusev demonstrate the relation between global equity and bond funds from a network perspective. In their article, “The Bond–Equity–Fund Relation Using the Fama–French–Carhart Factors: A Practical Network Approach,” they show the advantages of graph theory to explain the collective b y gu es t o n Fe br ua ry 5 , 2 02 1. C op yr ig ht 2 02 0 Pa ge an t M ed ia L td .","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132549573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0