{"title":"Theoretical analysis of principal components in an umbrella model of intraspecific evolution","authors":"Maxime Estavoyer , Olivier François","doi":"10.1016/j.tpb.2022.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.</p></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":"148 ","pages":"Pages 11-21"},"PeriodicalIF":1.2000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0040580922000521/pdfft?md5=e289fcb0a12b991033f6945f5b6b7d2e&pid=1-s2.0-S0040580922000521-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580922000521","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model – the umbrella model – for the diffusion of genetic variants. The model is based on genetic drift without any particular geographical structure. In the umbrella model, splits from an ancestral population occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. When variants uniquely represented in the sample are removed, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including singleton variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.
期刊介绍:
An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena.
Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.