结合假设和图形网络进行基因表达数据分析

Demba Fofana, E. O. George, Dale Bowman
{"title":"结合假设和图形网络进行基因表达数据分析","authors":"Demba Fofana, E. O. George, Dale Bowman","doi":"10.1186/s40488-021-00126-z","DOIUrl":null,"url":null,"abstract":"Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining assumptions and graphical network into gene expression data analysis\",\"authors\":\"Demba Fofana, E. O. George, Dale Bowman\",\"doi\":\"10.1186/s40488-021-00126-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.\",\"PeriodicalId\":52216,\"journal\":{\"name\":\"Journal of Statistical Distributions and Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Distributions and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40488-021-00126-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Distributions and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40488-021-00126-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

严格分析基因表达数据需要考虑假设,但也依赖于使用有关基因之间存在的网络关系的信息。结合这些不同的元素不仅可以提高统计能力,而且还提供了一个更好的框架,通过这个框架可以正确地分析基因表达。我们提出了一个新的统计模型,将假设和基因网络信息结合到分析中。假设是重要的,因为每个检验统计量只有在必要的假设成立时才有效。因此,我们提出混合p值,并证明,在原假设下,这些p值是均匀分布的。这些建议的混合p值考虑了假设。我们将基因网络信息纳入分析,因为邻近基因共享生物学功能。这种相关因素是通过邻近基因的相似先验概率来考虑的。通过一系列的仿真,将该方法与其他方法进行了比较。构建ROC曲线下面积(auc)来比较不同的方法;基于我们方法的AUC比其他方法的AUC大。对于回归分析,我们提出的方法的AUC包含Spearman检验和Pearson检验的AUC。此外,与其他方法相比,我们的方法的真阴性率(tnr)也被称为特异性更高。以两组比较分析为例,当样本量为n=10时,我们提出的方法对应的特异性为0.716146,t检验特异性为0.689223,秩和特异性为0.69797。我们将假设和网络信息结合到分析中的方法被证明是更强大的。这些建议的程序被介绍为一般类型的方法,可以结合程序选择,考虑多重测试,并将图形网络信息纳入分析。我们在仿真和实际数据分析中都取得了很好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Combining assumptions and graphical network into gene expression data analysis
Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Statistical Distributions and Applications
Journal of Statistical Distributions and Applications Decision Sciences-Statistics, Probability and Uncertainty
自引率
0.00%
发文量
0
审稿时长
13 weeks
期刊最新文献
A generalization to the log-inverse Weibull distribution and its applications in cancer research Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models Structural properties of generalised Planck distributions New class of Lindley distributions: properties and applications Tolerance intervals in statistical software and robustness under model misspecification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1