{"title":"Multiple testing in genome-wide association studies via hierarchical hidden Markov models","authors":"Pengfei Wang, Zhaofeng Tian","doi":"10.1016/j.jspi.2024.106161","DOIUrl":null,"url":null,"abstract":"<div><p>Problems of large-scale multiple testing are often encountered in modern scientific research. Conventional multiple testing procedures usually suffer considerable loss of testing efficiency when correlations among tests are ignored. In fact, appropriate use of correlation information not only enhances the efficacy of the testing procedure, but also improves the interpretability of the results. Since the disease- or trait-related single nucleotide polymorphisms (SNPs) tend to be clustered and exhibit serial correlations, hidden Markov model (HMM) based multiple testing procedures have been successfully applied in genome-wide association studies (GWAS). However, modeling the entire chromosome using a single HMM is somewhat rough. To overcome this issue, this paper employs the hierarchical hidden Markov model (HHMM) to describe local correlations among tests, and develops a multiple testing procedure that can automatically divide different class of chromosome regions, while taking into account local correlations among tests. We first propose an oracle procedure that is shown theoretically to be valid, and in fact optimal in some sense. We then develop a date-driven procedure to mimic the oracle version. Extensive simulations and a real data example show that the novel multiple testing procedure outperforms its competitors.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"232 ","pages":"Article 106161"},"PeriodicalIF":0.8000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824000181","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Problems of large-scale multiple testing are often encountered in modern scientific research. Conventional multiple testing procedures usually suffer considerable loss of testing efficiency when correlations among tests are ignored. In fact, appropriate use of correlation information not only enhances the efficacy of the testing procedure, but also improves the interpretability of the results. Since the disease- or trait-related single nucleotide polymorphisms (SNPs) tend to be clustered and exhibit serial correlations, hidden Markov model (HMM) based multiple testing procedures have been successfully applied in genome-wide association studies (GWAS). However, modeling the entire chromosome using a single HMM is somewhat rough. To overcome this issue, this paper employs the hierarchical hidden Markov model (HHMM) to describe local correlations among tests, and develops a multiple testing procedure that can automatically divide different class of chromosome regions, while taking into account local correlations among tests. We first propose an oracle procedure that is shown theoretically to be valid, and in fact optimal in some sense. We then develop a date-driven procedure to mimic the oracle version. Extensive simulations and a real data example show that the novel multiple testing procedure outperforms its competitors.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.