Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.

IF 1.6 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Genes & genomics Pub Date : 2024-11-06 DOI:10.1007/s13258-024-01584-w
Iksoo Huh, Taesung Park
{"title":"Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.","authors":"Iksoo Huh, Taesung Park","doi":"10.1007/s13258-024-01584-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The permutation test has been widely used to provide the p-values of statistical tests when the standard test statistics do not follow parametric null distributions. However, the permutation test may require huge numbers of iterations, especially when the detection of very small p-values is required for multiple testing adjustments in the analysis of datasets with a large number of features.</p><p><strong>Objective: </strong>To overcome this computational burden, we suggest a novel enhanced adaptive permutation test that estimates p-values using the negative binomial (NB) distribution. By the method, the number of permutations are differently determined for individual features according to their potential significance.</p><p><strong>Methods: </strong>In detail, the permutation procedure stops, when test statistics from the permuted dataset exceed the observed statistics from the original dataset by a predefined number of times. We showed that this procedure reduced the number of permutations especially when there were many insignificant features. For significant features, we enhanced the reduction with Stouffer's method after splitting datasets.</p><p><strong>Results: </strong>From the simulation study, we found that the enhanced adaptive permutation test dramatically reduced the number of permutations while keeping the precision of the permutation p-value within a small range, when compared to the ordinary permutation test. In real data analysis, we applied the enhanced adaptive permutation test to a genome-wide single nucleotide polymorphism (SNP) dataset of 327,872 features.</p><p><strong>Conclusion: </strong>We found the analysis with the enhanced adaptive permutation took a feasible time for genome-wide omics datasets, and successfully identified features of highly significant p-values with reasonable confidence intervals.</p>","PeriodicalId":12675,"journal":{"name":"Genes & genomics","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genes & genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s13258-024-01584-w","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The permutation test has been widely used to provide the p-values of statistical tests when the standard test statistics do not follow parametric null distributions. However, the permutation test may require huge numbers of iterations, especially when the detection of very small p-values is required for multiple testing adjustments in the analysis of datasets with a large number of features.

Objective: To overcome this computational burden, we suggest a novel enhanced adaptive permutation test that estimates p-values using the negative binomial (NB) distribution. By the method, the number of permutations are differently determined for individual features according to their potential significance.

Methods: In detail, the permutation procedure stops, when test statistics from the permuted dataset exceed the observed statistics from the original dataset by a predefined number of times. We showed that this procedure reduced the number of permutations especially when there were many insignificant features. For significant features, we enhanced the reduction with Stouffer's method after splitting datasets.

Results: From the simulation study, we found that the enhanced adaptive permutation test dramatically reduced the number of permutations while keeping the precision of the permutation p-value within a small range, when compared to the ordinary permutation test. In real data analysis, we applied the enhanced adaptive permutation test to a genome-wide single nucleotide polymorphism (SNP) dataset of 327,872 features.

Conclusion: We found the analysis with the enhanced adaptive permutation took a feasible time for genome-wide omics datasets, and successfully identified features of highly significant p-values with reasonable confidence intervals.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在全基因组 omics 数据集中使用负二项分布的增强型自适应 permutation 检验。
背景:当标准检验统计量不服从参数空分布时,置换检验被广泛用于提供统计检验的 p 值。然而,置换检验可能需要大量的迭代,尤其是在分析具有大量特征的数据集时,需要检测非常小的 p 值以进行多重检验调整:为了克服这种计算负担,我们提出了一种新的增强型自适应 permutation 检验方法,该方法利用负二项分布(NB)估计 p 值。通过这种方法,我们可以根据各个特征的潜在重要性来确定不同的置换次数:具体来说,当被置换数据集的测试统计量超过原始数据集观测统计量的预定次数时,置换程序就会停止。我们的研究表明,这一程序减少了置换次数,尤其是在有很多不重要特征的情况下。对于重要特征,我们在拆分数据集后使用 Stouffer 方法加强了减少的效果:通过模拟研究,我们发现与普通的置换检验相比,增强型自适应置换检验大大减少了置换次数,同时将置换 p 值的精度保持在较小的范围内。在实际数据分析中,我们将增强型自适应置换检验应用于包含 327 872 个特征的全基因组单核苷酸多态性(SNP)数据集:我们发现,使用增强型自适应置换法分析全基因组 Omics 数据集所需的时间是可行的,而且能成功识别出具有合理置信区间的高显著 p 值特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genes & genomics
Genes & genomics 生物-生化与分子生物学
CiteScore
3.70
自引率
4.80%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Genes & Genomics is an official journal of the Korean Genetics Society (http://kgenetics.or.kr/). Although it is an official publication of the Genetics Society of Korea, membership of the Society is not required for contributors. It is a peer-reviewed international journal publishing print (ISSN 1976-9571) and online version (E-ISSN 2092-9293). It covers all disciplines of genetics and genomics from prokaryotes to eukaryotes from fundamental heredity to molecular aspects. The articles can be reviews, research articles, and short communications.
期刊最新文献
Complete chloroplast genomes of three Polygala species and indel marker development for identification of authentic polygalae radix (Polygala tenuifolia). Granzyme mRNA-miRNA interaction and its implication to functional impact. A novel ATP2A2 mutation in Darier and genotype phenotype: correlation analysis. Analysis of key pathways and genes in nodal structure on rat skin surface using gene ontology and KEGG pathway. Bacterial profile-based body fluid identification using a machine learning approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1