Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS ACS Applied Bio Materials Pub Date : 2024-09-02 DOI:10.1016/j.jempfin.2024.101538
Erik Mekelburg , Jack Strauss
{"title":"Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data","authors":"Erik Mekelburg ,&nbsp;Jack Strauss","doi":"10.1016/j.jempfin.2024.101538","DOIUrl":null,"url":null,"abstract":"<div><p>We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"79 ","pages":"Article 101538"},"PeriodicalIF":4.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0927539824000732/pdfft?md5=a9db7e6e4ae641bec07f185220532c35&pid=1-s2.0-S0927539824000732-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927539824000732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用高维数据对机器学习预测进行汇集和胜选,以预测股票回报率
我们使用一个包含数百个公司级特征的新数据集来评估美国市场回报率的可预测性。我们应用了 LASSO、Elastic Net、Random Forest、Neural Net、Extreme Gradient Boosting 和 Light Gradient Boosting Machine 方法,发现这些模型的预测误差较大,导致预测失败。然而,对机器学习模型预测进行胜选和池化可提供一致的样本外预测能力。为了评估稳健性,我们将机器学习方法应用于加拿大、中国、德国和英国的高维数据以及 Goyal-Welch 数据。我们所考虑的所有机器学习模型,除了集合汇集方法外,都无法显著预测整个样本的回报率,这凸显了汇集、评估其他经济体的重要性,以及单个机器学习方法的脆弱性。随着稀疏程度和变量重要性的不断变化,我们的结果揭示了稀疏性与密度之间的争论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
期刊最新文献
A Systematic Review of Sleep Disturbance in Idiopathic Intracranial Hypertension. Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models. Anti-Myelin-Associated Glycoprotein Neuropathy: Recent Developments. Approach to Managing the Initial Presentation of Multiple Sclerosis: A Worldwide Practice Survey. Association Between LACE+ Index Risk Category and 90-Day Mortality After Stroke.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1