利用高维数据对机器学习预测进行汇集和胜选，以预测股票回报率

IF 2.1 2区经济学 Q2 BUSINESS, FINANCE Journal of Empirical Finance Pub Date : 2024-09-02 DOI:10.1016/j.jempfin.2024.101538

Erik Mekelburg , Jack Strauss

{"title":"利用高维数据对机器学习预测进行汇集和胜选，以预测股票回报率","authors":"Erik Mekelburg , Jack Strauss","doi":"10.1016/j.jempfin.2024.101538","DOIUrl":null,"url":null,"abstract":"<div><p>We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.</p></div>","PeriodicalId":15704,"journal":{"name":"Journal of Empirical Finance","volume":"79 ","pages":"Article 101538"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0927539824000732/pdfft?md5=a9db7e6e4ae641bec07f185220532c35&pid=1-s2.0-S0927539824000732-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data\",\"authors\":\"Erik Mekelburg , Jack Strauss\",\"doi\":\"10.1016/j.jempfin.2024.101538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.</p></div>\",\"PeriodicalId\":15704,\"journal\":{\"name\":\"Journal of Empirical Finance\",\"volume\":\"79 \",\"pages\":\"Article 101538\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0927539824000732/pdfft?md5=a9db7e6e4ae641bec07f185220532c35&pid=1-s2.0-S0927539824000732-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Empirical Finance\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0927539824000732\",\"RegionNum\":2,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BUSINESS, FINANCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Empirical Finance","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927539824000732","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}

引用次数: 0

摘要

我们使用一个包含数百个公司级特征的新数据集来评估美国市场回报率的可预测性。我们应用了 LASSO、Elastic Net、Random Forest、Neural Net、Extreme Gradient Boosting 和 Light Gradient Boosting Machine 方法，发现这些模型的预测误差较大，导致预测失败。然而，对机器学习模型预测进行胜选和池化可提供一致的样本外预测能力。为了评估稳健性，我们将机器学习方法应用于加拿大、中国、德国和英国的高维数据以及 Goyal-Welch 数据。我们所考虑的所有机器学习模型，除了集合汇集方法外，都无法显著预测整个样本的回报率，这凸显了汇集、评估其他经济体的重要性，以及单个机器学习方法的脆弱性。随着稀疏程度和变量重要性的不断变化，我们的结果揭示了稀疏性与密度之间的争论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data

We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Empirical Finance Multiple-

CiteScore

3.40

自引率

3.80%

发文量

期刊介绍： The Journal of Empirical Finance is a financial economics journal whose aim is to publish high quality articles in empirical finance. Empirical finance is interpreted broadly to include any type of empirical work in financial economics, financial econometrics, and also theoretical work with clear empirical implications, even when there is no empirical analysis. The Journal welcomes articles in all fields of finance, such as asset pricing, corporate finance, financial econometrics, banking, international finance, microstructure, behavioural finance, etc. The Editorial Team is willing to take risks on innovative research, controversial papers, and unusual approaches. We are also particularly interested in work produced by young scholars. The composition of the editorial board reflects such goals.

期刊最新文献

High-frequency realized stochastic volatility model Jump tail risk exposure and the cross-section of stock returns Time-varying variance decomposition of macro-finance term structure models Technological shocks and stock market volatility over a century Is firm-level political risk priced in the corporate bond market?