{"title":"Bayesian StairwayPlot for Inferring Single Population Demographic Histories From Site Frequency Spectra.","authors":"Sebastian Höhna, Ana Catalán","doi":"10.1111/1755-0998.14087","DOIUrl":null,"url":null,"abstract":"<p><p>The StairwayPlot approach provides an elegant, flexible and powerful method to estimate complex demographic histories of single populations from site frequency spectrum data. It uses expected coalescent times to compute the expected site frequency spectrum within a multinomial likelihood function. Population sizes are allowed to vary freely between coalescent events but are constant within each interval. Here, we implement the StairwayPlot approach in the Bayesian software package RevBayes. We use approaches developed for Bayesian Skyline Plots, which include independent and identically distributed (i.i.d.) population sizes, Gaussian Markov random fields and Horseshoe Markov random fields as prior distributions on population sizes. Furthermore, we implement a recently developed approach for computing the leave-one-out cross-validation probability for efficient model selection. We compare inference from our Bayesian implementation to the original Maximum Likelihood implementation, StairwayPlot2. Our results show that our Bayesian implementation in RevBayes performs comparable to StairwayPlot2 in terms of parameter accuracy, which is expected given that both use the same underlying likelihood function. From our set of prior models, the Gaussian Markov random field prior performed best for smoothly varying demographic histories, while the Horseshoe Markov random field performs best for abruptly changing demographic histories. We conclude the study by exploring several choices often faced in empirical studies, including the estimate of the total sequence length, the assumed mutation rate, as well as biases through mis-calling ancestral alleles. We show using our empirical example that as few as 10 diploid individuals are sufficient to infer complex demographic histories, but at least 500 k single nucleotide polymorphisms (SNPs) are required.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e14087"},"PeriodicalIF":5.5000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.14087","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The StairwayPlot approach provides an elegant, flexible and powerful method to estimate complex demographic histories of single populations from site frequency spectrum data. It uses expected coalescent times to compute the expected site frequency spectrum within a multinomial likelihood function. Population sizes are allowed to vary freely between coalescent events but are constant within each interval. Here, we implement the StairwayPlot approach in the Bayesian software package RevBayes. We use approaches developed for Bayesian Skyline Plots, which include independent and identically distributed (i.i.d.) population sizes, Gaussian Markov random fields and Horseshoe Markov random fields as prior distributions on population sizes. Furthermore, we implement a recently developed approach for computing the leave-one-out cross-validation probability for efficient model selection. We compare inference from our Bayesian implementation to the original Maximum Likelihood implementation, StairwayPlot2. Our results show that our Bayesian implementation in RevBayes performs comparable to StairwayPlot2 in terms of parameter accuracy, which is expected given that both use the same underlying likelihood function. From our set of prior models, the Gaussian Markov random field prior performed best for smoothly varying demographic histories, while the Horseshoe Markov random field performs best for abruptly changing demographic histories. We conclude the study by exploring several choices often faced in empirical studies, including the estimate of the total sequence length, the assumed mutation rate, as well as biases through mis-calling ancestral alleles. We show using our empirical example that as few as 10 diploid individuals are sufficient to infer complex demographic histories, but at least 500 k single nucleotide polymorphisms (SNPs) are required.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.