Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations.

IF 7.2 1区生物学 Q1 Agricultural and Biological Sciences PLoS Biology Pub Date : 2024-10-09 eCollection Date: 2024-10-01 DOI:10.1371/journal.pbio.3002847

Joshua G Schraiber, Michael D Edge, Matt Pennell

{"title":"Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations.","authors":"Joshua G Schraiber, Michael D Edge, Matt Pennell","doi":"10.1371/journal.pbio.3002847","DOIUrl":null,"url":null,"abstract":"<p><p>In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these 2 fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we lay out a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., genome-wide association studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur analytically and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study, we re-examine an analysis testing for coevolution of expression levels between genes across a fungal phylogeny and show that including eigenvectors of the covariance matrix as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.</p>","PeriodicalId":49001,"journal":{"name":"PLoS Biology","volume":"22 10","pages":"e3002847"},"PeriodicalIF":7.2000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493298/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pbio.3002847","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these 2 fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we lay out a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., genome-wide association studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur analytically and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate by showing how a standard GWAS technique-including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding to the principal components of the genotype matrix, in a regression model-can mitigate spurious correlations in phylogenetic analyses. As a case study, we re-examine an analysis testing for coevolution of expression levels between genes across a fungal phylogeny and show that including eigenvectors of the covariance matrix as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

统一统计遗传学和系统发生学的方法，绘制结构化种群的表型图。

在统计遗传学和系统发育学中，一个主要目标是确定遗传位点或表型或环境的其他方面与重点性状之间的相关性。在这两个领域中，针对这些任务有复杂但不同的统计传统。随着医学、保护生物学和进化生物学中的问题越来越依赖于整合物种内部和物种之间的数据，它们各自方法之间的脱节正变得难以维系，曾经清晰的概念划分也变得越来越模糊。为了帮助弥合这一分歧，我们建立了一个通用模型，描述不同个体的遗传贡献对定量表型的协方差。这种方法表明，统计遗传学（如全基因组关联研究；GWAS）和系统发育比较生物学（如系统发育回归）中的标准模型都可以解释为这种更通用的定量遗传模型的特例。这些模型具有相同的核心结构，这意味着我们可以统一认识不同方法的优势和局限性，以便在检测关联时控制遗传结构。我们通过分析和对数量性状进行群体遗传和系统发育模拟，建立了关于为什么以及什么时候会出现虚假相关性的直觉。统计遗传学和系统发育学的问题在结构上具有相似性，这使我们能够将一个领域的方法论进展应用于另一个领域。我们通过展示标准的 GWAS 技术--在回归模型中包括遗传相关性矩阵（GRM）及其前导特征向量（对应于基因型矩阵的主成分）--如何减轻系统发育分析中的虚假相关性。作为一项案例研究，我们重新检验了一项测试真菌系统发育中基因间表达水平协同进化的分析，结果表明将协方差矩阵的特征向量作为协变量可降低假阳性率，同时提高真阳性率。更广泛地说，这项工作为采用更综合的方法了解表型的遗传结构以及进化过程如何塑造这种结构奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PLoS Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-BIOLOGY

CiteScore

15.40

自引率

2.00%

发文量

359

审稿时长

3-8 weeks

期刊介绍： PLOS Biology is the flagship journal of the Public Library of Science (PLOS) and focuses on publishing groundbreaking and relevant research in all areas of biological science. The journal features works at various scales, ranging from molecules to ecosystems, and also encourages interdisciplinary studies. PLOS Biology publishes articles that demonstrate exceptional significance, originality, and relevance, with a high standard of scientific rigor in methodology, reporting, and conclusions. The journal aims to advance science and serve the research community by transforming research communication to align with the research process. It offers evolving article types and policies that empower authors to share the complete story behind their scientific findings with a diverse global audience of researchers, educators, policymakers, patient advocacy groups, and the general public. PLOS Biology, along with other PLOS journals, is widely indexed by major services such as Crossref, Dimensions, DOAJ, Google Scholar, PubMed, PubMed Central, Scopus, and Web of Science. Additionally, PLOS Biology is indexed by various other services including AGRICOLA, Biological Abstracts, BIOSYS Previews, CABI CAB Abstracts, CABI Global Health, CAPES, CAS, CNKI, Embase, Journal Guide, MEDLINE, and Zoological Record, ensuring that the research content is easily accessible and discoverable by a wide range of audiences.