Ensemble learning-based classification of microarray cancer data on tree-based features

IF 1.2 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation and Systems Pub Date : 2021-02-25 DOI:10.1049/ccs2.12003

Guesh Dagnew, B.H. Shekar

{"title":"Ensemble learning-based classification of microarray cancer data on tree-based features","authors":"Guesh Dagnew, B.H. Shekar","doi":"10.1049/ccs2.12003","DOIUrl":null,"url":null,"abstract":"<p>Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.</p>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":"3 1","pages":"48-60"},"PeriodicalIF":1.2000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12003","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 12

Abstract

Cancer is a group of related diseases with high mortality rate characterized by abnormal cell growth which attacks the body tissues. Microarray cancer data is a prominent research topic across many disciplines focused to address problems related to the higher curse of dimensionality, a small number of samples, noisy data and imbalance class. A random forest (RF) tree-based feature selection and ensemble learning based on hard voting and soft voting is proposed to classify microarray cancer data using six different base classifiers. The selected features due to RF tree are submitted to the base classifiers as the training set. Then, an ensemble learning method is applied to the base classifiers in which case each base classifier predicts class label individually. The final prediction is carried out hard and soft voting techniques that use majority voting and weighted probability on the test set. The proposed ensemble learning method is validated on eight different standard microarray cancer datasets, of which three of the datasets are binary class and the remaining five datasets are multi-class datasets. Experimental results of the proposed method show 1.00 classification accuracy on six of the datasets and 0.96 on two of the datasets.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于树状特征的集成学习微阵列癌症数据分类

癌症是一类以细胞生长异常为特征，以攻击机体组织为特征的高死亡率的相关疾病。微阵列癌症数据是一个跨多个学科的突出研究课题，致力于解决与维数高、样本数量少、噪声数据和不平衡类相关的问题。提出了一种基于随机森林(RF)树的特征选择和基于硬投票和软投票的集成学习方法，使用六种不同的基分类器对微阵列癌症数据进行分类。通过RF树选择的特征作为训练集提交给基分类器。然后，将集成学习方法应用于基分类器，每个基分类器单独预测类标签。最终的预测采用硬投票和软投票技术，分别对测试集使用多数投票和加权概率。在8个不同的标准微阵列癌症数据集上验证了所提出的集成学习方法，其中3个数据集为二分类数据集，其余5个数据集为多分类数据集。实验结果表明，该方法在6个数据集上的分类准确率为1.00，在2个数据集上的分类准确率为0.96。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊