Unknown Type Streaming Feature Selection via Maximal Information Coefficient

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI:10.1109/ICDMW58026.2022.00089

Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao

{"title":"Unknown Type Streaming Feature Selection via Maximal Information Coefficient","authors":"Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao","doi":"10.1109/ICDMW58026.2022.00089","DOIUrl":null,"url":null,"abstract":"Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于最大信息系数的未知类型流特征选择

特征选择旨在从原始数据集中选择最优的最小特征子集，是数据挖掘和机器学习前不可或缺的预处理组成部分，特别是在大数据时代。大多数特征选择方法隐含地假设我们可以在学习之前知道特征类型(分类、数值或混合)，然后设计相应的测量来计算特征之间的相关性。然而，在实际应用中，特征可能是动态生成的，并随着时间的推移一个接一个地到达，我们称之为流特征。大多数现有的流特征选择方法假设所有动态生成的特征都是相同的类型，或者假设我们可以动态地知道每个新到达的特征的特征类型，但这是不合理和不现实的。因此，本文首先研究了未知类型流特征选择的实际问题，并提出了一种新的处理方法UT-SFS。大量的实验结果表明了新方法的有效性。UT-SFS是非参数的，在学习前不需要知道特征类型，符合实际应用需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量