提高组学研究的发现能力

Systems biomedicine (Austin, Tex.) Pub Date : 2013-04-11 DOI:10.4161/sysb.25774

Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter

{"title":"提高组学研究的发现能力","authors":"Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter","doi":"10.4161/sysb.25774","DOIUrl":null,"url":null,"abstract":"Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"84 - 93"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25774","citationCount":"3","resultStr":"{\"title\":\"Increasing the discovery power of -omics studies\",\"authors\":\"Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter\",\"doi\":\"10.4161/sysb.25774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.\",\"PeriodicalId\":90057,\"journal\":{\"name\":\"Systems biomedicine (Austin, Tex.)\",\"volume\":\"1 1\",\"pages\":\"84 - 93\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.4161/sysb.25774\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems biomedicine (Austin, Tex.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4161/sysb.25774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biomedicine (Austin, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4161/sysb.25774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

动机:目前的临床和生物学研究应用不同的生物技术，并随后结合所得到的组学数据来测试生物学假设。大量的组学数据及其组合产生了大量的假设，显然增加了研究的力量。与这些预期相反，组学数据的丰富甚至可能降低研究的统计能力，因为多重检验的校正系数很大。通常，这种分析组学数据的能力损失是由测量中错误检测率(FDR)的增加引起的，例如错误地检测DNA拷贝数变化，或错误地识别差异表达基因。假检测是随机的，因此与测试条件无关。因此，高FDR大大降低了研究的发现能力，特别是在涉及不同组学数据的情况下。结果:在HapMap数据集上，已知的CNVs必须重新检测，I/NI调用过滤比基于方差的过滤更有效。特别是，在具有罕见事件(如HapMap数据集中的cnv)的数据上，I/NI调用过滤器的性能优于基于方差的过滤器。我们评估了I/NI呼叫过滤器在降低两种不同癌细胞系上的FDR方面的效率，其中它将FDR降低了18至22倍。材料和方法:对于过高的fdr，一种缓解策略是过滤掉假定的错误检测。我们建议使用概率潜在变量模型来识别假定的错误检测，这些错误检测可能通过高估计噪声或基于模型的样本测量不一致性通过此类模型发现。为了选择这样的模型，贝叶斯方法从假设没有检测的最大先验模型开始，然后选择最大后验模型。因此，检测导致最大后验与最大先验模型的偏差，该模型由数据获得的信息增益测量。如果此信息增益超过阈值，则所选模型获得指示检测的信息/非信息(I/NI)调用。I/NI调用滤波已经成功应用于不同的项目中，但是目前还没有证明在I/NI调用滤波后进行多次测试的校正仍然可以控制I型错误率。我们证明了I/NI调用的这一重要性质，并表明它独立于零假设的常用检验统计量。我们将I/NI调用应用于转录组学(基因表达)，其中先前的模型对应于比较样本中恒定的基因表达水平，以及基因组学，分析拷贝数变异(CNV)数据，其中先前的模型对应于比较样本中恒定的DNA拷贝数2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Increasing the discovery power of -omics studies

Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Systems biomedicine (Austin, Tex.)

自引率

0.00%

发文量