{"title":"Application of multiple support vector machine recursive feature elimination model in cancer feature gene selection","authors":"Wenbin Xu, H. Xia, Weiying Zheng","doi":"10.3760/CMA.J.ISSN.1673-4181.2019.01.006","DOIUrl":null,"url":null,"abstract":"Objective \nTo analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset. \n \n \nMethods \nGene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database. The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified. \n \n \nResults \nUsing the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%). \n \n \nConclusions \nThe feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer. \n \n \nKey words: \nGene expression profile; Recursive feature elimination; Support vector machine; Feature gene","PeriodicalId":61751,"journal":{"name":"国际生物医学工程杂志","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"国际生物医学工程杂志","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.3760/CMA.J.ISSN.1673-4181.2019.01.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To analyze the cancergene expression profile data using multi-support vector machine recursive feature elimination algorithm (MSVM-RFE) and calculate the genetic ranking score to obtain the optimal feature gene subset.
Methods
Gene expression profiles of bladder cancer, breast cancer, colon cancer and lung cancer were downloaded from GEO (Gene Expression Omnibus) database. The differentially expressed genes were obtained by differential expression analysis. The differential gene expressions were sequenced by MSVM-RFE algorithm and the average test errors of each gene subset were calculated. Then the optimal gene subsetsof four kinds of cancer were obtained according to the minimum average test errors. Based on the datasets of four kinds of cancer characteristic genes before and after screening, linear SVM classifiers were constructed and the classification efficiencies of the optimal feature gene subsets were verified.
Results
Using the optimal feature gene subsetobtained by MSVM-RFE algorithm, the classification accuracy was improved from (96.77±1.28)% to (99.85±0.46)% for the bladder cancer data, improved from (83.77±4.93)% to (88.30±3.85)% for the breast cancer data, and improved from (72.69±2.41)% to (90.21±3.31)% for the lung cancer data.Besides, theoptimal feature gene subsetkept the classification accuracy of colon cancer classifierat a high level (>99.5%).
Conclusions
The feature gene extraction based on MSVM-RFE algorithm can improve the classification efficiency of cancer.
Key words:
Gene expression profile; Recursive feature elimination; Support vector machine; Feature gene