Feature Grouping and Selection on High-Dimensional Microarray Data

M. García-Torres, Francisco Gómez-Vela, D. Becerra-Alonso, B. Melián-Batista, Marcos Moreno-Vega
{"title":"Feature Grouping and Selection on High-Dimensional Microarray Data","authors":"M. García-Torres, Francisco Gómez-Vela, D. Becerra-Alonso, B. Melián-Batista, Marcos Moreno-Vega","doi":"10.1109/DMIA.2015.18","DOIUrl":null,"url":null,"abstract":"In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power.","PeriodicalId":387758,"journal":{"name":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Workshop on Data Mining with Industrial Applications (DMIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMIA.2015.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高维微阵列数据的特征分组与选择
在分类任务中,随着维数的增加,分类器的性能会不断提高,直到达到最优的特征数量。在不增加训练样本数量的情况下进一步增加维数会导致分类器性能下降。这一事实被称为维度的诅咒,随着更大数据集的出现和大数据知识发现的需求变得更加相关。在这种情况下,特征分组已成为提供有关特征之间关系的附加信息的有效方法。在这项工作中,我们提出了一种贪婪策略,称为GreedyPGG,该策略基于马尔可夫毯子的概念对特征进行分组。为此,我们引入了优势特征群的概念。我们还提出了一种适应于高维特征选择的可变邻域搜索(VNS),它使用GreedyPGG来减少搜索空间。我们测试了GreedyPGG在合成数据集上的有效性和VNS在微阵列数据集上的有效性。我们将VNS与流行的竞争策略进行比较。结果表明,GreedyPGG能够有效地对相关特征进行分组,VNS是一种竞争策略,能够发现少量具有高预测能力的特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Feature Selection via Approximated Markov Blankets Using the CFS Method Teaching an Learning Business Intelligence: Business Evaluation Last but Not Least Towards a Data Processing Architecture for the Weather Radar of the INTA Anguil Data Mining Applications in Entrepreneurship Analysis Feature Grouping and Selection on High-Dimensional Microarray Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1