结合假设和图形网络进行基因表达数据分析

Q2 Mathematics Journal of Statistical Distributions and Applications Pub Date : 2021-07-08 DOI:10.1186/s40488-021-00126-z

Demba Fofana, E. O. George, Dale Bowman

{"title":"结合假设和图形网络进行基因表达数据分析","authors":"Demba Fofana, E. O. George, Dale Bowman","doi":"10.1186/s40488-021-00126-z","DOIUrl":null,"url":null,"abstract":"Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":"8 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining assumptions and graphical network into gene expression data analysis\",\"authors\":\"Demba Fofana, E. O. George, Dale Bowman\",\"doi\":\"10.1186/s40488-021-00126-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.\",\"PeriodicalId\":52216,\"journal\":{\"name\":\"Journal of Statistical Distributions and Applications\",\"volume\":\"8 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Distributions and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40488-021-00126-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Distributions and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40488-021-00126-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

严格分析基因表达数据需要考虑假设，但也依赖于使用有关基因之间存在的网络关系的信息。结合这些不同的元素不仅可以提高统计能力，而且还提供了一个更好的框架，通过这个框架可以正确地分析基因表达。我们提出了一个新的统计模型，将假设和基因网络信息结合到分析中。假设是重要的，因为每个检验统计量只有在必要的假设成立时才有效。因此，我们提出混合p值，并证明，在原假设下，这些p值是均匀分布的。这些建议的混合p值考虑了假设。我们将基因网络信息纳入分析，因为邻近基因共享生物学功能。这种相关因素是通过邻近基因的相似先验概率来考虑的。通过一系列的仿真，将该方法与其他方法进行了比较。构建ROC曲线下面积(auc)来比较不同的方法;基于我们方法的AUC比其他方法的AUC大。对于回归分析，我们提出的方法的AUC包含Spearman检验和Pearson检验的AUC。此外，与其他方法相比，我们的方法的真阴性率(tnr)也被称为特异性更高。以两组比较分析为例，当样本量为n=10时，我们提出的方法对应的特异性为0.716146,t检验特异性为0.689223，秩和特异性为0.69797。我们将假设和网络信息结合到分析中的方法被证明是更强大的。这些建议的程序被介绍为一般类型的方法，可以结合程序选择，考虑多重测试，并将图形网络信息纳入分析。我们在仿真和实际数据分析中都取得了很好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Combining assumptions and graphical network into gene expression data analysis

Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistical Distributions and Applications Decision Sciences-Statistics, Probability and Uncertainty

自引率

0.00%

发文量

审稿时长

13 weeks