A Comparative Performance Evaluation of Random Forest Feature Selection on Classification of Hepatocellular Carcinoma Gene Expression Data

M. Latief, T. Siswantining, A. Bustamam, Devvi Sarwinda
{"title":"A Comparative Performance Evaluation of Random Forest Feature Selection on Classification of Hepatocellular Carcinoma Gene Expression Data","authors":"M. Latief, T. Siswantining, A. Bustamam, Devvi Sarwinda","doi":"10.1109/ICICoS48119.2019.8982435","DOIUrl":null,"url":null,"abstract":"Hepatocellular carcinoma is one of the cancers that cause death in the world. We get hepatocellular carcinoma data in the form of microarray data gene expression obtained from the National Center for Biotechnology Information website consisting of 40 samples and 54675 features. The main purpose of this research is to compare the performance evaluation of Hepatocellular Carcinoma by applying feature selection to several classification algorithms. Random Forest feature selection method will be paired with several classification algorithms such as Support Vector Classification, Neural Network Classification, Random Forest, Logistic Regression, and Naïve Bayes. This study uses 5-fold cross-validation as an evaluation method. The results showed that Random Forest algorithm, Neural Network, Vector Machine Classification, and Naive Bayes show higher classification performance evaluation than without using random forest feature selection, while the Logistic Regression model provides a higher performance evaluation without using Random Forest feature selection. Support Vector Classification offers the highest performance evaluation compared to four other algorithms using feature selection, but Logistic Regression provides higher performance evaluation compared to different classification algorithms without feature selection.","PeriodicalId":105407,"journal":{"name":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS48119.2019.8982435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Hepatocellular carcinoma is one of the cancers that cause death in the world. We get hepatocellular carcinoma data in the form of microarray data gene expression obtained from the National Center for Biotechnology Information website consisting of 40 samples and 54675 features. The main purpose of this research is to compare the performance evaluation of Hepatocellular Carcinoma by applying feature selection to several classification algorithms. Random Forest feature selection method will be paired with several classification algorithms such as Support Vector Classification, Neural Network Classification, Random Forest, Logistic Regression, and Naïve Bayes. This study uses 5-fold cross-validation as an evaluation method. The results showed that Random Forest algorithm, Neural Network, Vector Machine Classification, and Naive Bayes show higher classification performance evaluation than without using random forest feature selection, while the Logistic Regression model provides a higher performance evaluation without using Random Forest feature selection. Support Vector Classification offers the highest performance evaluation compared to four other algorithms using feature selection, but Logistic Regression provides higher performance evaluation compared to different classification algorithms without feature selection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
随机森林特征选择在肝癌基因表达数据分类中的比较性能评价
肝细胞癌是世界上导致死亡的癌症之一。我们从国家生物技术信息中心网站上获得了40个样本和54675个特征的微阵列数据,以基因表达的形式获得肝细胞癌数据。本研究的主要目的是通过将特征选择应用于几种分类算法来比较肝癌的性能评价。随机森林特征选择方法将与几种分类算法配对,如支持向量分类、神经网络分类、随机森林、逻辑回归和Naïve贝叶斯。本研究采用5倍交叉验证作为评价方法。结果表明,随机森林算法、神经网络、向量机分类和朴素贝叶斯的分类性能评价高于未使用随机森林特征选择的分类性能评价,而未使用随机森林特征选择的逻辑回归模型的分类性能评价更高。与使用特征选择的其他四种算法相比,支持向量分类提供了最高的性能评估,但与不使用特征选择的其他分类算法相比,逻辑回归提供了更高的性能评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of GPGPU-Based Brute-Force and Dictionary Attack on SHA-1 Password Hash Ranking of Game Mechanics for Gamification in Mobile Payment Using AHP-TOPSIS: Uses and Gratification Perspective An Assesment of Knowledge Sharing System: SCeLE Universitas Indonesia Improved Line Operator for Retinal Blood Vessel Segmentation Classification of Abnormality in Chest X-Ray Images by Transfer Learning of CheXNet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1