Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter
{"title":"Increasing the discovery power of -omics studies","authors":"Djork-Arné Clevert, A. Mayr, Andreas Mitterecker, G. Klambauer, A. Valsesia, K. Forner, M. Tuefferd, W. Talloen, J. Wojcik, Hinrich W. H. Göhlmann, S. Hochreiter","doi":"10.4161/sysb.25774","DOIUrl":null,"url":null,"abstract":"Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"84 - 93"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25774","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biomedicine (Austin, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4161/sysb.25774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Motivation: Current clinical and biological studies apply different biotechnologies and subsequently combine the resulting -omics data to test biological hypotheses. The plethora of -omics data and their combination generates a large number of hypotheses and apparently increases the study power. Contrary to these expectations, the wealth of -omics data may even reduce the statistical power of a study because of a large correction factor for multiple testing. Typically, this loss of power in analyzing -omics data are caused by an increased false detection rate (FDR) in measurements, like falsely detected DNA copy number changes, or falsely identified differentially expressed genes. The false detections are random and, therefore, not related to the tested conditions. Thus, a high FDR considerably decreases the discovery power of studies, especially if different -omics data are involved. Results: On a HapMap data set, where known CNVs have to be re-detected, I/NI call filtering was much more efficient than variance-based filtering. In particular, the I/NI call filter outperforms variance-based filters on data with rare events like the CNVs in the HapMap data set. We assessed the efficiency of the I/NI call filter in reducing the FDR on two different cancer cell lines where it reduced the FDR 18- to 22-fold. Materials and Methods: A mitigation strategy for too high FDRs is to filter out putative false detections. We suggest using probabilistic latent variable models to identify putative false detections which may be found via such models by high estimated noise or by model-based measurement inconsistencies across samples. To select such a model, a Bayesian approach starts with the maximum a priori model that assumes no detection and selects the maximum a posteriori model. Hence detection results in a deviation of the maximal posterior from the maximal prior model measured by the information gain obtained by the data. If this information gain exceeds a threshold then the selected model obtains an Informative/Non-Informative (I/NI) call that indicates a detection. I/NI call filtering has been successfully applied in different projects, but it has so far not been shown that correction for multiple testing after I/NI call filtering still controls the type-I error rate. We prove this important property of the I/NI call and show that it is independent of commonly used test statistics for null hypotheses. We apply the I/NI call to transcriptomics (gene expression), where the prior model corresponds to a constant gene expression level across compared samples, and to genomics, analyzing copy number variation (CNV) data, where the prior model corresponds to a constant DNA copy number of 2 across compared samples.