基于RNAseqData中差异基因负二项分布统计检验的推断和样本量计算

Journal of biometrics & biostatistics Pub Date : 2017-01-30 DOI:10.4172/2155-6180.1000332

Xiaohong Li, N. Cooper, Y. Shyr, Dongfeng Wu, E. Rouchka, R. Gill, T. O’Toole, G. Brock, S. Rai

{"title":"基于RNAseqData中差异基因负二项分布统计检验的推断和样本量计算","authors":"Xiaohong Li, N. Cooper, Y. Shyr, Dongfeng Wu, E. Rouchka, R. Gill, T. O’Toole, G. Brock, S. Rai","doi":"10.4172/2155-6180.1000332","DOIUrl":null,"url":null,"abstract":"The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by Li et al. in 2013. In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design.","PeriodicalId":87294,"journal":{"name":"Journal of biometrics & biostatistics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Inference and Sample Size Calculations Based on Statistical Tests in aNegative Binomial Distribution for Differential Gene Expression in RNAseqData\",\"authors\":\"Xiaohong Li, N. Cooper, Y. Shyr, Dongfeng Wu, E. Rouchka, R. Gill, T. O’Toole, G. Brock, S. Rai\",\"doi\":\"10.4172/2155-6180.1000332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by Li et al. in 2013. In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design.\",\"PeriodicalId\":87294,\"journal\":{\"name\":\"Journal of biometrics & biostatistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of biometrics & biostatistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4172/2155-6180.1000332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biometrics & biostatistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2155-6180.1000332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

高通量RNA测序(RNA-seq)技术已成为转录组学和差异表达基因检测的热门方法。RNA-seq实验设计的样本量计算是生物学研究和临床试验中的一个重要考虑因素。目前，已经开发了由Wald和泊松分布的似然比统计检验导出的样本量公式来模拟RNA-seq数据。然而，由于真实RNA-seq数据中的平均读取计数不等于方差，Li等人于2013年提出了一种基于负二项分布的扩展方法，使用精确检验统计量计算样本容量。在本研究中，我们分别使用Wald检验、对数变换Wald检验和对数似然比检验统计量推导了五种基于负二项分布的样本量计算方法。通过计算不同场景下的样本量和模拟功率，将五种方法与现有方法进行比较。我们首先使用六种方法计算单个基因测试的样本量，假设名义显著性水平α为0.05，功率为80%。然后，在错误发现率(FDR)为0.05和0.10的情况下，我们计算了测试多个基因的样本量。通过模拟研究，估计了六种方法估计样本量对应的差异基因表达分析的经验功率和真实预后基因。使用从对数变换和基于wald的测试中导出的样本大小公式，与其他方法相比，我们观察到更小的样本属性，同时在所有设置中保持接近或高于80%的标称功率。此外，在RNA-seq实验设计中，基于Wald检验的样本量计算方法更容易计算，速度更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Inference and Sample Size Calculations Based on Statistical Tests in aNegative Binomial Distribution for Differential Gene Expression in RNAseqData

The high throughput RNA sequencing (RNA-seq) technology has become the popular method of choice for transcriptomics and the detection of differentially expressed genes. Sample size calculations for RNA-seq experimental design are an important consideration in biological research and clinical trials. Currently, the sample size formulas derived from the Wald and the likelihood ratio statistical tests with a Poisson distribution to model RNA-seq data have been developed. However, since the mean read counts in the real RNA-seq data are not equal to the variance, an extended method to calculate sample sizes based on a negative binomial distribution using an exact test statistic was proposed by Li et al. in 2013. In this study, we alternatively derive five sample size calculation methods based on the negative binomial distribution using the Wald test, the log-transformed Wald test and the log-likelihood ratio test statistics. A comparison of our five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. We first calculated the sample sizes for testing a single gene using the six methods given a nominal significance level α at 0.05 and 80% power. Then, we calculated the sample sizes for testing multiple genes given a false discovery rate (FDR) at 0.05 and 0.10. The empirical power and true prognostic genes for differential gene expression analysis corresponding to the estimated sample sizes from the six methods are also estimated via the simulation studies. Using the sample size formulas derived from log-transformed and Wald-based tests, we observed smaller sample properties while maintaining the nominal power close to or higher than 80% in all the settings compared to other methods. Moreover, the Wald test based sample size calculation method is easier to compute and faster in an RNA-seq experimental design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of biometrics & biostatistics

自引率

0.00%

发文量