{"title":"A two-stage machine learning approach for pathway analysis","authors":"Wei Zhang, S. Emrich, Erliang Zeng","doi":"10.1109/BIBM.2010.5706576","DOIUrl":null,"url":null,"abstract":"Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.