基于深度神经网络的特征选择与局部错误发现率估算

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Intelligence Pub Date : 2024-11-27 DOI:10.1007/s10489-024-05944-7
Zixuan Cao, Xiaoya Sun, Yan Fu
{"title":"基于深度神经网络的特征选择与局部错误发现率估算","authors":"Zixuan Cao,&nbsp;Xiaoya Sun,&nbsp;Yan Fu","doi":"10.1007/s10489-024-05944-7","DOIUrl":null,"url":null,"abstract":"<div><p>Feature selection, aiming at identifying the most significant subset of features from the original data, plays a prominent role in high-dimensional data processing. To a certain extent, feature selection can mitigate the issue of poor interpretability of deep neural networks (DNNs). Despite recent advancements in DNN-based feature selection, most methods overlook the error control of selected features and lack reproducibility. In this paper, we propose a new method called <i>DeepTD</i> to perform error-controlled feature selection for DNNs, in which artificial decoy features are constructed and subjected to competition with the original features according to the feature importance scores computed from the trained network, enabling p-value-free local false discovery rate (FDR) estimation of selected features. The merits of DeepTD include: a new DNN-derived measure of feature importance combining the weights and gradients of the network; the first algorithm that estimates the local FDR based on DNN-derived scores; confidence assessment of individual selected features; better robustness to small numbers of important features and low FDR thresholds than competition-based FDR control methods, e.g., the knockoff filter. On multiple synthetic datasets, DeepTD accurately estimated the local FDR and empirically controlled the FDR with 10<span>\\(\\%\\)</span> higher power on average than knockoff filter. At lower FDR thresholds, the power of our method has even reached two to three times that of other state-of-the-art methods. DeepTD was also applied to real datasets and selected 31<span>\\(\\%\\)</span>-49<span>\\(\\%\\)</span> more features than alternatives, demonstrating its validity and utility.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep neural network-based feature selection with local false discovery rate estimation\",\"authors\":\"Zixuan Cao,&nbsp;Xiaoya Sun,&nbsp;Yan Fu\",\"doi\":\"10.1007/s10489-024-05944-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Feature selection, aiming at identifying the most significant subset of features from the original data, plays a prominent role in high-dimensional data processing. To a certain extent, feature selection can mitigate the issue of poor interpretability of deep neural networks (DNNs). Despite recent advancements in DNN-based feature selection, most methods overlook the error control of selected features and lack reproducibility. In this paper, we propose a new method called <i>DeepTD</i> to perform error-controlled feature selection for DNNs, in which artificial decoy features are constructed and subjected to competition with the original features according to the feature importance scores computed from the trained network, enabling p-value-free local false discovery rate (FDR) estimation of selected features. The merits of DeepTD include: a new DNN-derived measure of feature importance combining the weights and gradients of the network; the first algorithm that estimates the local FDR based on DNN-derived scores; confidence assessment of individual selected features; better robustness to small numbers of important features and low FDR thresholds than competition-based FDR control methods, e.g., the knockoff filter. On multiple synthetic datasets, DeepTD accurately estimated the local FDR and empirically controlled the FDR with 10<span>\\\\(\\\\%\\\\)</span> higher power on average than knockoff filter. At lower FDR thresholds, the power of our method has even reached two to three times that of other state-of-the-art methods. DeepTD was also applied to real datasets and selected 31<span>\\\\(\\\\%\\\\)</span>-49<span>\\\\(\\\\%\\\\)</span> more features than alternatives, demonstrating its validity and utility.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-05944-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05944-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

特征选择旨在从原始数据中识别出最重要的特征子集,在高维数据处理中发挥着重要作用。在一定程度上,特征选择可以缓解深度神经网络(DNN)可解释性差的问题。尽管近年来基于 DNN 的特征选择取得了一些进展,但大多数方法都忽略了对所选特征的误差控制,缺乏可重复性。在本文中,我们提出了一种名为 DeepTD 的新方法,用于对 DNN 进行误差控制的特征选择,即构建人工诱饵特征,并根据训练网络计算出的特征重要性得分与原始特征进行竞争,从而实现对所选特征的无 p 值局部错误发现率 (FDR) 估计。DeepTD 的优点包括:结合网络权重和梯度的新 DNN 衍生特征重要性度量;首个基于 DNN 衍生分数估算本地 FDR 的算法;对单个选定特征进行置信度评估;与基于竞争的 FDR 控制方法(如山寨过滤器)相比,对少量重要特征和低 FDR 阈值具有更好的鲁棒性。在多个合成数据集上,DeepTD 准确地估计了局部 FDR,并根据经验控制了 FDR,其平均功率比 knockoff 过滤器高出 10(%)。在较低的 FDR 阈值下,我们的方法的能力甚至达到了其他最先进方法的两到三倍。DeepTD 还被应用于真实数据集,并且比其他方法多选择了 31()-49()个特征,这证明了它的有效性和实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep neural network-based feature selection with local false discovery rate estimation

Feature selection, aiming at identifying the most significant subset of features from the original data, plays a prominent role in high-dimensional data processing. To a certain extent, feature selection can mitigate the issue of poor interpretability of deep neural networks (DNNs). Despite recent advancements in DNN-based feature selection, most methods overlook the error control of selected features and lack reproducibility. In this paper, we propose a new method called DeepTD to perform error-controlled feature selection for DNNs, in which artificial decoy features are constructed and subjected to competition with the original features according to the feature importance scores computed from the trained network, enabling p-value-free local false discovery rate (FDR) estimation of selected features. The merits of DeepTD include: a new DNN-derived measure of feature importance combining the weights and gradients of the network; the first algorithm that estimates the local FDR based on DNN-derived scores; confidence assessment of individual selected features; better robustness to small numbers of important features and low FDR thresholds than competition-based FDR control methods, e.g., the knockoff filter. On multiple synthetic datasets, DeepTD accurately estimated the local FDR and empirically controlled the FDR with 10\(\%\) higher power on average than knockoff filter. At lower FDR thresholds, the power of our method has even reached two to three times that of other state-of-the-art methods. DeepTD was also applied to real datasets and selected 31\(\%\)-49\(\%\) more features than alternatives, demonstrating its validity and utility.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
期刊最新文献
Robust unsupervised feature selection based on matrix factorization with adaptive loss via bi-stochastic graph regularization A spatial interpolation based on neighbor cluster adaptive model with spatial color block clustering algorithm Multi-granularity representation learning for sketch-based dynamic face image retrieval ChatGPT vs state-of-the-art models: a benchmarking study in keyphrase generation task Federated edge learning for medical image augmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1