{"title":"An Improved Probabilistic Model for Finding Differential Gene Expression","authors":"Li Zhang, Xuejun Liu","doi":"10.1109/BMEI.2009.5302665","DOIUrl":null,"url":null,"abstract":"Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.","PeriodicalId":6389,"journal":{"name":"2009 2nd International Conference on Biomedical Engineering and Informatics","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 2nd International Conference on Biomedical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2009.5302665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.