Theoretical analysis of principal components in an umbrella model of intraspecific evolution

IF 1.2 4区 生物学 Q4 ECOLOGY Theoretical Population Biology Pub Date : 2022-12-01 DOI:10.1016/j.tpb.2022.08.002
Maxime Estavoyer , Olivier François
{"title":"Theoretical analysis of principal components in an umbrella model of intraspecific evolution","authors":"Maxime Estavoyer ,&nbsp;Olivier François","doi":"10.1016/j.tpb.2022.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.</p></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":"148 ","pages":"Pages 11-21"},"PeriodicalIF":1.2000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0040580922000521/pdfft?md5=e289fcb0a12b991033f6945f5b6b7d2e&pid=1-s2.0-S0040580922000521-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580922000521","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
种内进化伞形模型中主成分的理论分析
主成分分析(PCA)是从多位点基因型数据中描述种群结构最常用的方法之一。然而,关于现代人类地理范围的扩展,PCA的解释受到质疑,因为在主成分中观察到的波浪状模式存在不确定性。确实有人认为,当PCA应用于遗传分化随地理距离增加的数据时,波浪状模式是数学伪影,通常会出现。在这里,我们提出了另一种理论,为观察波样模式的主成分分析。我们研究了一个聚结模型-伞模型-遗传变异的扩散。该模型基于遗传漂变,没有任何特定的地理结构。在保护伞模型中,祖先种群的分裂几乎连续不断地发生,以有规律的速度产生小的女儿种群。我们的研究结果为样本基因组序列在该模型下的主成分分析提供了特征值和特征向量的详细数学描述。当样本中唯一表示的变量被移除时,PCA特征向量被定义为周期性增加的余弦函数,再现在平衡距离隔离模型中观察到的波状模式。包括分析中的单变量,最大特征值对应的特征向量呈现复杂的波形。我们的预测的准确性进一步研究了聚结模拟。我们的分析支持这样的假设,即高度结构化的波浪状模式可能只来自遗传漂变,而可能并不总是空间结构化数据的人为结果。根据我们的新理论,与美洲人类有关的基因组数据被重新分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Theoretical Population Biology
Theoretical Population Biology 生物-进化生物学
CiteScore
2.50
自引率
14.30%
发文量
43
审稿时长
6-12 weeks
期刊介绍: An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena. Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.
期刊最新文献
Species coexistence as an emergent effect of interacting mechanisms. Effect of competition on emergent phases and phase transitions in competitive systems. Catching a wave: On the suitability of traveling-wave solutions in epidemiological modeling. Editorial. The impact of simultaneous infections on phage-host ecology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1