A Sparse-Modeling based approach for Class-Specific feature selection

Davide Nardone, A. Ciaramella, A. Staiano
{"title":"A Sparse-Modeling based approach for Class-Specific feature selection","authors":"Davide Nardone, A. Ciaramella, A. Staiano","doi":"10.7287/peerj.preprints.27740v1","DOIUrl":null,"url":null,"abstract":"In this work, we propose a novel Feature Selection framework, called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and might speed the experimental validation up. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two steps approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach might outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.","PeriodicalId":93040,"journal":{"name":"PeerJ preprints","volume":"83 1","pages":"e27740"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ preprints","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7287/peerj.preprints.27740v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this work, we propose a novel Feature Selection framework, called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and might speed the experimental validation up. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two steps approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach might outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于稀疏建模的类特定特征选择方法
在这项工作中,我们提出了一个新的特征选择框架,称为基于稀疏建模的类特定特征选择方法(SMBA-CSFS),它同时利用了稀疏建模和类特定特征选择的思想。特征选择在几个领域(例如,计算生物学)中起着关键作用,使得用更少的变量来处理模型成为可能,反过来,通过提供对其作用重要性的有价值的见解,更容易解释,并可能加快实验验证。不幸的是,也由没有免费的午餐定理证实,文献中的方法都不是最容易检测到构建最终模型的最佳特征子集,因此它仍然是一个挑战。所提出的特征选择过程采用两步方法:(a)首先使用基于稀疏建模的学习技术为训练集的每一类找到最佳特征子集;(b)然后将发现的特征子集馈送到特定类别的特征选择方案中,以评估所选特征在分类任务中的有效性。为此,构建一个分类器集成,每个分类器在前一阶段发现自己的特征子集上进行训练,并采用适当的决策规则计算集成响应。为了评估所提出的方法的性能,已经在公开可用的数据集上进行了大量的实验,特别是属于计算生物学领域的数据集,其中特征选择是必不可少的:急性淋巴细胞白血病和急性髓性白血病,人类癌症,人类肺癌,弥漫性大b细胞淋巴瘤和恶性胶质瘤。SMBA-CSFS能够识别/检索最具代表性的特征,最大限度地提高分类精度。与文献中的竞争对手相比,SMBA-CSFS具有前20和前80个特性,在所有考虑的数据集上,特别是那些具有更多特性的数据集上,表现出了很好的性能。实验表明,当特征数量较大时,所提出的方法可能优于目前最先进的方法。因此,所引入的方法可以用于具有大量特征和类别的数据的选择和分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A framework for designing compassionate and ethical artificial intelligence and artificial consciousness Time series event correlation with DTW and Hierarchical Clustering methods Securing ad hoc on-demand distance vector routing protocol against the black hole DoS attack in MANETs 12 Grand Challenges in Single-Cell Data Science Mice tracking using the YOLO algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1