Online Scalable Streaming Feature Selection via Dynamic Decision

ACM Transactions on Knowledge Discovery from Data (TKDD) Pub Date : 2022-03-10 DOI:10.1145/3502737

Peng Zhou, Shu Zhao, Yuan-Ting Yan, X. Wu

{"title":"Online Scalable Streaming Feature Selection via Dynamic Decision","authors":"Peng Zhou, Shu Zhao, Yuan-Ting Yan, X. Wu","doi":"10.1145/3502737","DOIUrl":null,"url":null,"abstract":"Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Feature selection is one of the core concepts in machine learning, which hugely impacts the model’s performance. For some real-world applications, features may exist in a stream mode that arrives one by one over time, while we cannot know the exact number of features before learning. Online streaming feature selection aims at selecting optimal stream features at each timestamp on the fly. Without the global information of the entire feature space, most of the existing methods select stream features in terms of individual feature information or the comparison of features in pairs. This article proposes a new online scalable streaming feature selection framework from the dynamic decision perspective that is scalable on running time and selected features by dynamic threshold adjustment. Regarding the philosophy of “Thinking-in-Threes”, we classify each new arrival feature as selecting, discarding, or delaying, aiming at minimizing the overall decision risks. With the dynamic updating of global statistical information, we add the selecting features into the candidate feature subset, ignore the discarding features, cache the delaying features into the undetermined feature subset, and wait for more information. Meanwhile, we perform the redundancy analysis for the candidate features and uncertainty analysis for the undetermined features. Extensive experiments on eleven real-world datasets demonstrate the efficiency and scalability of our new framework compared with state-of-the-art algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于动态决策的在线可扩展流特征选择

特征选择是机器学习的核心概念之一，对模型的性能有很大的影响。对于一些现实世界的应用程序，特征可能以流模式存在，随着时间的推移一个接一个地到达，而我们在学习之前无法知道特征的确切数量。在线流特征选择的目的是在每个时间戳上选择最优的流特征。现有的流特征选择方法大多是根据单个特征信息或对特征的比较来选择流特征，缺乏整个特征空间的全局信息。本文从动态决策的角度提出了一种新的在线可扩展流特征选择框架，该框架可以根据运行时间和所选特征进行动态阈值调整。根据“三合一思考”的理念，我们将每一个新的到达特征分类为选择、丢弃或延迟，以最小化整体决策风险。利用全局统计信息的动态更新，将选择特征添加到候选特征子集中，忽略丢弃特征，将延迟特征缓存到待定特征子集中，等待更多信息。同时，对候选特征进行冗余分析，对未确定特征进行不确定分析。在11个真实数据集上进行的大量实验表明，与最先进的算法相比，我们的新框架具有效率和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data (TKDD)

自引率

0.00%

发文量

期刊最新文献

Machine Learning-based Short-term Rainfall Prediction from Sky Data Incremental Feature Spaces Learning with Label Scarcity Multi-objective Learning to Overcome Catastrophic Forgetting in Time-series Applications Combining Filtering and Cross-Correlation Efficiently for Streaming Time Series Segment-Wise Time-Varying Dynamic Bayesian Network with Graph Regularization