Powershap: A Power-full Shapley Feature Selection Method

Jarne Verhaeghe, Jeroen Van Der Donckt, F. Ongenae, S. Hoecke
{"title":"Powershap: A Power-full Shapley Feature Selection Method","authors":"Jarne Verhaeghe, Jeroen Van Der Donckt, F. Ongenae, S. Hoecke","doi":"10.48550/arXiv.2206.08394","DOIUrl":null,"url":null,"abstract":"Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present powershap, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. Powershap is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, powershap provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, powershap is implemented as a plug-and-play and open-source sklearn component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the powershap algorithm, allowing to use the algorithm without any configuration needed.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.08394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present powershap, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. Powershap is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, powershap provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, powershap is implemented as a plug-and-play and open-source sklearn component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the powershap algorithm, allowing to use the algorithm without any configuration needed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Powershap: Power-full Shapley特征选择方法
特征选择是开发健壮和强大的机器学习模型的关键步骤。特征选择技术可以分为两类:过滤方法和包装方法。虽然包装器方法通常会产生很强的预测性能,但它们的计算复杂度很高,因此需要花费大量的时间来完成,特别是在处理高维特征集时。另外,过滤方法要快得多,但也有其他缺点,比如(i)需要一个阈值,(ii)不考虑特征之间的相互关系,(iii)忽略特征与模型的相互作用。为此,我们提出了一种新的包装特征选择方法powershap,该方法利用统计假设检验和功率计算结合Shapley值进行快速直观的特征选择。Powershap建立在一个核心假设之上,即与已知的随机特征相比,信息特征对预测的影响更大。基准测试和模拟表明,powershap优于其他过滤器方法,其预测性能与包装器方法相当,同时速度快得多,通常甚至可以达到执行时间的一半或三分之一。因此,powershap提供了一种具有竞争力的快速算法,可用于不同领域的各种模型。此外,powershap是作为即插即用和开源的sklearn组件实现的,可以轻松集成到传统的数据科学管道中。通过提供自动模式,可以自动调整powershap算法的超参数,从而进一步增强用户体验,从而允许在不需要任何配置的情况下使用该算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods Offline Reinforcement Learning with On-Policy Q-Function Regularization Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification Online Network Source Optimization with Graph-Kernel MAB
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1