Model-independent variable selection via the rule-based variable priorit

Min Lu, Hemant Ishwaran
{"title":"Model-independent variable selection via the rule-based variable priorit","authors":"Min Lu, Hemant Ishwaran","doi":"arxiv-2409.09003","DOIUrl":null,"url":null,"abstract":"While achieving high prediction accuracy is a fundamental goal in machine\nlearning, an equally important task is finding a small number of features with\nhigh explanatory power. One popular selection technique is permutation\nimportance, which assesses a variable's impact by measuring the change in\nprediction error after permuting the variable. However, this can be problematic\ndue to the need to create artificial data, a problem shared by other methods as\nwell. Another problem is that variable selection methods can be limited by\nbeing model-specific. We introduce a new model-independent approach, Variable\nPriority (VarPro), which works by utilizing rules without the need to generate\nartificial data or evaluate prediction error. The method is relatively easy to\nuse, requiring only the calculation of sample averages of simple statistics,\nand can be applied to many data settings, including regression, classification,\nand survival. We investigate the asymptotic properties of VarPro and show,\namong other things, that VarPro has a consistent filtering property for noise\nvariables. Empirical studies using synthetic and real-world data show the\nmethod achieves a balanced performance and compares favorably to many\nstate-of-the-art procedures currently used for variable selection.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While achieving high prediction accuracy is a fundamental goal in machine learning, an equally important task is finding a small number of features with high explanatory power. One popular selection technique is permutation importance, which assesses a variable's impact by measuring the change in prediction error after permuting the variable. However, this can be problematic due to the need to create artificial data, a problem shared by other methods as well. Another problem is that variable selection methods can be limited by being model-specific. We introduce a new model-independent approach, Variable Priority (VarPro), which works by utilizing rules without the need to generate artificial data or evaluate prediction error. The method is relatively easy to use, requiring only the calculation of sample averages of simple statistics, and can be applied to many data settings, including regression, classification, and survival. We investigate the asymptotic properties of VarPro and show, among other things, that VarPro has a consistent filtering property for noise variables. Empirical studies using synthetic and real-world data show the method achieves a balanced performance and compares favorably to many state-of-the-art procedures currently used for variable selection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过基于规则的变量优先级选择与模型无关的变量
虽然实现高预测准确率是机器学习的基本目标,但同样重要的任务是找到少量具有高解释力的特征。一种流行的选择技术是置换重要性(permutationimportance),它通过测量变量置换后预测误差的变化来评估变量的影响。然而,由于需要创建人工数据,这可能会产生问题,这也是其他方法共同面临的问题。另一个问题是,变量选择方法可能会受到特定模型的限制。我们引入了一种独立于模型的新方法 VariablePriority (VarPro),它利用规则进行工作,无需生成人工数据或评估预测误差。这种方法相对简单,只需计算简单统计量的样本平均值,可应用于多种数据设置,包括回归、分类和生存。我们对 VarPro 的渐近特性进行了研究,结果表明 VarPro 对噪声变量具有一致的过滤特性。使用合成数据和真实世界数据进行的实证研究表明,该方法实现了均衡的性能,与目前用于变量选择的许多最先进的程序相比毫不逊色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fitting Multilevel Factor Models Cartan moving frames and the data manifolds Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks Recurrent Interpolants for Probabilistic Time Series Prediction PieClam: A Universal Graph Autoencoder Based on Overlapping Inclusive and Exclusive Communities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1