去偏随机森林变量选择

Dhruv Sharma
{"title":"去偏随机森林变量选择","authors":"Dhruv Sharma","doi":"10.2139/ssrn.1975801","DOIUrl":null,"url":null,"abstract":"This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.","PeriodicalId":384078,"journal":{"name":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"De-Biased Random Forest Variable Selection\",\"authors\":\"Dhruv Sharma\",\"doi\":\"10.2139/ssrn.1975801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.\",\"PeriodicalId\":384078,\"journal\":{\"name\":\"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.1975801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Other Econometrics: Data Collection & Data Estimation Methodology (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1975801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文提出了一种利用干净随机森林算法消除随机森林变量选择偏差的新方法。stroble etal(2007)已经表明随机森林偏向于具有许多水平或类别和规模的变量和相关变量,这可能导致一些膨胀的变量重要性度量。该算法构建不包含每个变量的随机森林,并在删除变量时保留变量,从而降低了随机森林的整体性能。该算法简单直接,其复杂度和速度是显著变量数量的函数。它比排列测试算法运行更有效,是解决已知偏差的另一种方法。本文对随机森林变量重要性的使用提出了一些规范的指导意见。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
De-Biased Random Forest Variable Selection
This paper proposes a new way to de-bias random forest variable selection using a clean random forest algorithm. Strobl etal (2007) have shown random forest to be biased towards variables with many levels or categories and scales and correlated variables which might result in some inflated variable importance measures. The proposed algorithm builds random forests without each variable and keeps variables when dropping them degrades the overall random forest performance. The algorithm is simple and straight forward and its complexity and speed is a function of the number of salient variables. It runs more efficiently than the permutation test algorithm and is an alternative method to address known biases. The paper concludes some normative guidance on how to use random forest variable importance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Structural Estimation Combining Micro and Macro Data Monetary Policy under Data Uncertainty: Interest-Rate Smoothing from a Cross-Country Perspective Nudging Towards Data Equity: The Role of Stewardship and Fiduciaries in the Digital Economy Re-Engineering Key National Economic Indicators Quant Research Ideas to Test for ETF Option and Equity Markets in China and Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1