一种寻找差异基因表达的改进概率模型

2009 2nd International Conference on Biomedical Engineering and Informatics Pub Date : 2009-10-30 DOI:10.1109/BMEI.2009.5302665

Li Zhang, Xuejun Liu

{"title":"一种寻找差异基因表达的改进概率模型","authors":"Li Zhang, Xuejun Liu","doi":"10.1109/BMEI.2009.5302665","DOIUrl":null,"url":null,"abstract":"Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.","PeriodicalId":6389,"journal":{"name":"2009 2nd International Conference on Biomedical Engineering and Informatics","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Improved Probabilistic Model for Finding Differential Gene Expression\",\"authors\":\"Li Zhang, Xuejun Liu\",\"doi\":\"10.1109/BMEI.2009.5302665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.\",\"PeriodicalId\":6389,\"journal\":{\"name\":\"2009 2nd International Conference on Biomedical Engineering and Informatics\",\"volume\":\"28 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 2nd International Conference on Biomedical Engineering and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BMEI.2009.5302665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 2nd International Conference on Biomedical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2009.5302665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

发现差异表达基因是微阵列实验的基本目标。最近提出的PPLR方法考虑了探针水平的测量误差，提高了发现差异基因表达的准确性。然而，PPLR在变分EM算法的e步中使用了重要采样过程，导致计算效率较低。我们修改了原始的PPLR，以获得一个改进的模型，用于寻找不同的基因表达。新模型IPPLR增加了隐变量来表示基因的真实表达，并消除了原PPLR中的重要采样。我们将IPPLR应用于一个峰值数据集和一个小鼠胚胎数据集。结果表明，IPPLR提高了发现差异基因表达的准确性和计算效率。微阵列(1)(2)目前被广泛用于获得基因表达的大规模测量。发现差异表达(DE)基因是微阵列实验的最基本目标。由于微阵列数据中存在着臭名昭著的噪声，在实验中通常使用重复来处理数据的可变性。此外，一些微阵列(如Affymetrix基因芯片)包含多个探针来询问基因表达谱。这为获得与每个基因表达测量相关的技术测量误差的估计提供了丰富的信息。这种错误信息对于弱表达基因尤其重要，因为这些基因通常与高变异性相关。概率方法提供了一种处理噪声数据的基本方法。大多数概率方法，如广泛使用的Cyber-T(3)和SAM(4)，都是基于基因表达值的单点估计，而忽略了相关的探针级测量误差。这浪费了数据中的丰富信息。近年来，在噪声数据分析(5)(6)(7)(8)中，数据点的测量误差受到越来越多的关注。PPLR(5)在寻找差异基因表达时考虑探针水平的测量误差。该方法已被证明比其他替代方法更准确(5)(9)。然而，PPLR在变分EM算法的e步中使用了重要采样过程。这将导致较差的精度和较低的计算效率。特别是当实验涉及大量芯片时，PPLR非常耗时。这使得PPLR在现实中的应用变得困难。在这篇文章中，我们通过添加隐藏变量来代表真实的基因表达来改进PPLR。这消除了原PPLR中重要性采样效率低下的问题。在峰值数据集和小鼠胚胎数据集上的结果表明，改进的PPLR (IPPLR)提高了寻找DE基因的准确性和计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Improved Probabilistic Model for Finding Differential Gene Expression

Finding differentially expressed genes is a funda- mental objective of a microarray experiment. Recently proposed method, PPLR, considers the probe-level measurement error and improves accuracy in finding differential gene expression. However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm, which leads to less computational efficiency. We modified the original PPLR to obtain an improved model for finding different gene expression. The new model, IPPLR, adds hidden variables to represent the true gene expressions and eliminates the importance sampling in original PPLR. We apply IPPLR on a spike-in data set and a mouse embryo data set. Results show that IPPLR improves accuracy and computational efficiency in finding differential gene expression. I. INTRODUCTION Microarray (1) (2) are currently widely used to obtain large- scale measurements of gene expression. Finding differentially expressed (DE) genes is the most basic objective of a mi- croarray experiment. Due to the notorious noise existing in microarray data, replicates are usually used in the experiments to deal with data variability. Moreover, some microarrays (such as Affymetrix GeneChips) contain multiple probes to interrogate gene expression profiles. This provides rich infor- mation to obtain an estimation of the technical measurement error associated with each gene expression measurement. This error information is especially significant for weakly expressed genes as these genes are often associated with high variability. Probabilistic methods provide a principle way to handle noisy data. Most of the probabilistic methods, such as the widely used methods, Cyber-T (3) and SAM (4), are based on single point estimates of gene expression values, and ignore the associated probe-level measurement error. This wastes rich information in data. Measurement error of data points has received more and more attention in noisy data analysis (5) (6) (7) (8) in recent years. PPLR (5) considers the probe-level measurement error in finding differential gene expression. This method has been proved to be more accurate than other alternatives (5) (9). However, PPLR uses the importance sampling procedure in the E-step of the variational EM algorithm. This leads to bad accuracy and less computational efficiency. Especially, when the experiment involves a large number of chips, PPLR is extremely time-consuming. This makes the application of PPLR difficult in reality. In this contribution, we improve PPLR by adding hidden variables to represent the true gene expression. This eliminates the inefficient importance sampling in original PPLR. Results on a spikes-in data set and a mouse embryo data set show that the improved PPLR, IPPLR, improves accuracy and computational efficiency in finding DE genes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 2nd International Conference on Biomedical Engineering and Informatics

自引率

0.00%

发文量

期刊最新文献

A Novel Approach for Blood Vessel Edge Detection in Retinal Images Skin Response During Irradiation by Intense Pulsed Light Based on Optical Imaging Technology and Histology Physical Properties of LYSO Scintillator for NN-PET Detectors A High Security Framework for SMS An Efficient Antenna Selection Algorithm for MIMO Systems