Online Scalable Streaming Feature Selection via Dynamic Decision

Peng Zhou, Shu Zhao, Yuan-Ting Yan, X. Wu
{"title":"Online Scalable Streaming Feature Selection via Dynamic Decision","authors":"Peng Zhou, Shu Zhao, Yuan-Ting Yan, X. Wu","doi":"10.1145/3502737","DOIUrl":null,"url":null,"abstract":"Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于动态决策的在线可扩展流特征选择
特征选择是机器学习的核心概念之一,对模型的性能有很大的影响。对于一些现实世界的应用程序,特征可能以流模式存在,随着时间的推移一个接一个地到达,而我们在学习之前无法知道特征的确切数量。在线流特征选择的目的是在每个时间戳上选择最优的流特征。现有的流特征选择方法大多是根据单个特征信息或对特征的比较来选择流特征,缺乏整个特征空间的全局信息。本文从动态决策的角度提出了一种新的在线可扩展流特征选择框架,该框架可以根据运行时间和所选特征进行动态阈值调整。根据“三合一思考”的理念,我们将每一个新的到达特征分类为选择、丢弃或延迟,以最小化整体决策风险。利用全局统计信息的动态更新,将选择特征添加到候选特征子集中,忽略丢弃特征,将延迟特征缓存到待定特征子集中,等待更多信息。同时,对候选特征进行冗余分析,对未确定特征进行不确定分析。在11个真实数据集上进行的大量实验表明,与最先进的算法相比,我们的新框架具有效率和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Risk factors of ipsilateral breast tumor recurrence in triple-negative or HER2-positive breast cancer patients who achieved pathological complete response after neoadjuvant chemotherapy.
IF 45.3 1区 医学Journal of Clinical OncologyPub Date : 2020-05-25 DOI: 10.1200/jco.2020.38.15_suppl.e12599
Mizuho Tazo, Y. Kojima, A. Yoshida, Sayuka Nakayama, R. Tokui, T. Ogawa, T. Kuwayama, T. Nakayama, H. Yamauchi, K. Tsugawa, Seigo Nakamura, N. Hayashi, M. Ishitobi
Abstract P2-16-27: Risk factors of ipsilateral breast tumor recurrence in primary breast cancer patients who achieved pathological complete response after neoadjuvant chemotherapy
IF 11.2 1区 医学Cancer researchPub Date : 2020-02-15 DOI: 10.1158/1538-7445.sabcs19-p2-16-27
N. Matsuda, N. Hayashi, R. Tokui, T. Nakayama, H. Yamauchi, M. Ishitobi
Risk Factors Predictive of Recurrence and Progression for Patients Who Suffered Initial Recurrence After Transurethral Resection of Stage pT1 Bladder Tumor in Chinese Population: A Retrospective Study
IF 1.6 4区 医学MedicinePub Date : 2016-02-01 DOI: 10.1097/MD.0000000000002625
Zhonghua Shen, Linguo Xie, Tao Chen, Dawei Tian, Xiaoteng Liu, Hao Xu, Yu Zhang, Zhouliang Wu, N. Sha, Chen Xing, Na Ding, Hailong Hu, Chang-li Wu
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning-based Short-term Rainfall Prediction from Sky Data Incremental Feature Spaces Learning with Label Scarcity Multi-objective Learning to Overcome Catastrophic Forgetting in Time-series Applications Combining Filtering and Cross-Correlation Efficiently for Streaming Time Series Segment-Wise Time-Varying Dynamic Bayesian Network with Graph Regularization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1