基于交叉验证递归特征消除和无监督深度信念网络分类器的基因表达数据特征选择方案

Nimrita Koul, S. Manvi
{"title":"基于交叉验证递归特征消除和无监督深度信念网络分类器的基因表达数据特征选择方案","authors":"Nimrita Koul, S. Manvi","doi":"10.1109/ICCCT2.2019.8824943","DOIUrl":null,"url":null,"abstract":"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.","PeriodicalId":445544,"journal":{"name":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier\",\"authors\":\"Nimrita Koul, S. Manvi\",\"doi\":\"10.1109/ICCCT2.2019.8824943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.\",\"PeriodicalId\":445544,\"journal\":{\"name\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCT2.2019.8824943\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing and Communications Technologies (ICCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2019.8824943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在癌症的治疗中,其疗效取决于尽早对肿瘤性质的正确诊断。微阵列基因表达数据包含了整个基因组的表达谱,为癌症生物标志物的鉴定提供了分析来源。微阵列数据具有特征数量多、样本数量少的特点。为了有效地利用这些数据,选择数量较少的基因用于分类等任务是非常有益的。本文提出了一种两级肿瘤特征选择和分类方案。首先,使用递归特征消除法对基因进行排序,递归特征消除法使用随机森林分类器对基因的适应度进行评估,并进行五次交叉验证,然后将这些基因用于预训练无监督深度信念网络分类器,根据所选基因对样本进行分类。我们将该方法得到的交叉验证矩阵参数(分类精度、精密度和召回率)与一些标准的特征选择器-分类器组合(支持向量机互信息、支持向量机核主成分分析、支持向量机-递归特征消除和随机森林分类器互信息)得到的结果进行了比较。结果表明,我们的方案与用于基因表达数据特征选择的标准方法相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier
In the treatment of cancers, the efficacy depends on the correct diagnosis of the nature of tumor as early as possible. Micro-array Gene expression data which contains the expression profiles of entire genome provides a source which can be analyzed to identify bio-markers of cancers. Micro-array data has a large number of features and very few number of samples. To make effective use of this data, it is very beneficial to select a reduced number of genes which can be used for tasks like classification. In this paper, we propose a two level scheme for feature selection and classification of cancers. First, the genes are ranked using Recursive Feature Elimination which uses Random Forest Classifier for evaluation of fitness of genes with five fold cross-validation , later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes. We compared the results in terms of cross validation matrix parameters viz. classification accuracy, precision and recall, obtained from our approach with the results obtained by using some standard feature selector-classifier combinations viz. Mutual Information with Support Vector Machines, Kernel Principal Component Analysis with Support Vector Machine, Support Vector Machine -Recursive Feature Elimination and Mutual Information with Random Forest Classifier. The results show that our scheme performs at par with standard methods used for feature selection from gene expression data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sustainability and Fog Computing: Applications, Advantages and Challenges Human Gait Recognition using Deep Convolutional Neural Network A Systematic analysis of Data-intensive MOOCs and their key Challenges Forensic Based Cloud Computing Architecture – Exploration and Implementation SPICE Modelling of CNTFET based Neuron Architecture for Low Power and High Speed applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1