Allele ages provide limited information about the strength of negative selection.

IF 3.3 3区 生物学 Q2 GENETICS & HEREDITY Genetics Pub Date : 2024-12-19 DOI:10.1093/genetics/iyae211
Vivaswat Shastry, Jeremy J Berg
{"title":"Allele ages provide limited information about the strength of negative selection.","authors":"Vivaswat Shastry, Jeremy J Berg","doi":"10.1093/genetics/iyae211","DOIUrl":null,"url":null,"abstract":"<p><p>For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyae211","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
等位基因年龄提供的负选择强度信息有限。
对于群体遗传学中的许多问题,描述新生突变的适应度效应(DFE)在某一类位点上的分布是有用的。DFE通常是通过将观测到的站点频谱(SFS)拟合到给定选择系数和人口历史假设分布的预期SFS来估计的。从单倍型比对中推断基因树的工具的发展,以及古代DNA资源,为我们提供了关于分离突变频率轨迹的额外信息。在这里,我们询问这些附加信息对学习DFE有多大用处,使用等位基因频率和年龄的联合分布来总结有关轨迹的信息。为此,在给定选择强度和任意复杂的种群规模历史的情况下,我们引入了一种精确而有效的数值方法来计算在给定样本频率下发现的分离变异年龄的密度。然后,我们使用这个框架来表明,负选择等位基因的无条件年龄分布非常接近于负选择SFS的中性年龄分布,这表明等位基因年龄提供的关于DFE的信息很少,超出了当前频率中已经包含的信息。为了验证这一预测,我们扩展了标准泊松随机场(PRF)方法,将频率和年龄的联合分布纳入选择系数的估计中,并通过仿真测试了其性能。我们发现,当观察到完整的SFS和已知的真实等位基因年龄时,在估计中包括年龄只提供了估计选择系数的准确性的小幅增加。然而,如果只观察频率高于某一阈值的站点,那么真实年龄可以提供关于选择系数的大量信息,特别是当选择系数很大时。当使用最先进的工具从单倍型数据估计年龄时,在完全观察到的SFS情况下,年龄的不确定性消除了大多数附加信息,而这些工具在估计年龄时假设的中性先验在阈值SFS情况下会导致向下偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genetics
Genetics GENETICS & HEREDITY-
CiteScore
6.90
自引率
6.10%
发文量
177
审稿时长
1.5 months
期刊介绍: GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.
期刊最新文献
Balancing selfing and outcrossing: the genetics and cell biology of nematodes with three sexual morphs. Acentric chromosome congression and alignment on the metaphase plate via kinetochore-independent forces. Genomic prediction of heterosis, inbreeding control, and mate allocation in outbred diploid and tetraploid populations. Global genotype by environment prediction competition reveals that diverse modeling strategies can deliver satisfactory maize yield estimates. MalKinID: A classification model for identifying malaria parasite genealogical relationships using identity-by-descent.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1