{"title":"Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data","authors":"Cheng Zhao, A. Deshwar, Q. Morris","doi":"10.4161/sysb.26131","DOIUrl":null,"url":null,"abstract":"As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"247 - 253"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26131","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biomedicine (Austin, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4161/sysb.26131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.