Hector Banos, Thomas K F Wong, Justin Daneau, Edward Susko, Bui Quang Minh, Robert Lanfear, Matthew W Brown, Laura Eme, Andrew J Roger
{"title":"GTRpmix: A Linked General Time-Reversible Model for Profile Mixture Models.","authors":"Hector Banos, Thomas K F Wong, Justin Daneau, Edward Susko, Bui Quang Minh, Robert Lanfear, Matthew W Brown, Laura Eme, Andrew J Roger","doi":"10.1093/molbev/msae174","DOIUrl":null,"url":null,"abstract":"<p><p>Profile mixture models capture distinct biochemical constraints on the amino acid substitution process at different sites in proteins. These models feature a mixture of time-reversible models with a common matrix of exchangeabilities and distinct sets of equilibrium amino acid frequencies known as profiles. Combining the exchangeability matrix with each profile generates the matrix of instantaneous rates of amino acid exchange for that profile. Currently, empirically estimated exchangeability matrices (e.g. the LG matrix) are widely used for phylogenetic inference under profile mixture models. However, these were estimated using a single profile and are unlikely optimal for profile mixture models. Here, we describe the GTRpmix model that allows maximum likelihood estimation of a common exchangeability matrix under any profile mixture model. We show that exchangeability matrices estimated under profile mixture models differ from the LG matrix, dramatically improving model fit and topological estimation accuracy for empirical test cases. Because the GTRpmix model is computationally expensive, we provide two exchangeability matrices estimated from large concatenated phylogenomic-supermatrices to be used for phylogenetic analyses. One, called Eukaryotic Linked Mixture (ELM), is designed for phylogenetic analysis of proteins encoded by nuclear genomes of eukaryotes, and the other, Eukaryotic and Archaeal Linked mixture (EAL), for reconstructing relationships between eukaryotes and Archaea. These matrices, combined with profile mixture models, fit data better and have improved topology estimation relative to the LG matrix combined with the same mixture models. Starting with version 2.3.1, IQ-TREE2 allows users to estimate linked exchangeabilities (i.e. amino acid exchange rates) under profile mixture models.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":null,"pages":null},"PeriodicalIF":11.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371462/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msae174","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Profile mixture models capture distinct biochemical constraints on the amino acid substitution process at different sites in proteins. These models feature a mixture of time-reversible models with a common matrix of exchangeabilities and distinct sets of equilibrium amino acid frequencies known as profiles. Combining the exchangeability matrix with each profile generates the matrix of instantaneous rates of amino acid exchange for that profile. Currently, empirically estimated exchangeability matrices (e.g. the LG matrix) are widely used for phylogenetic inference under profile mixture models. However, these were estimated using a single profile and are unlikely optimal for profile mixture models. Here, we describe the GTRpmix model that allows maximum likelihood estimation of a common exchangeability matrix under any profile mixture model. We show that exchangeability matrices estimated under profile mixture models differ from the LG matrix, dramatically improving model fit and topological estimation accuracy for empirical test cases. Because the GTRpmix model is computationally expensive, we provide two exchangeability matrices estimated from large concatenated phylogenomic-supermatrices to be used for phylogenetic analyses. One, called Eukaryotic Linked Mixture (ELM), is designed for phylogenetic analysis of proteins encoded by nuclear genomes of eukaryotes, and the other, Eukaryotic and Archaeal Linked mixture (EAL), for reconstructing relationships between eukaryotes and Archaea. These matrices, combined with profile mixture models, fit data better and have improved topology estimation relative to the LG matrix combined with the same mixture models. Starting with version 2.3.1, IQ-TREE2 allows users to estimate linked exchangeabilities (i.e. amino acid exchange rates) under profile mixture models.
轮廓混合模型捕捉了蛋白质中不同位点氨基酸替代过程的不同生化约束。这些模型的特点是时间可逆模型的混合,具有共同的交换率矩阵和不同的氨基酸平衡频率集(称为轮廓)。将可交换性矩阵与每个轮廓相结合,就会产生该轮廓的氨基酸瞬时交换率矩阵。目前,根据经验估算的交换率矩阵(如 LG 矩阵)被广泛用于特征混合模型下的系统发育推断。然而,这些矩阵是使用单一剖面估算的,不太可能是剖面混合模型的最佳矩阵。在这里,我们描述了 GTRpmix 模型,该模型允许在任何剖面混合模型下最大似然估计共同的可交换性矩阵。我们的研究表明,在剖面混合模型下估算出的可交换性矩阵与 LG 矩阵不同,这大大提高了模型拟合度和经验测试案例的拓扑估算精度。由于 GTRpmix 模型的计算成本很高,我们提供了两个从大型连接系统发生组-上表矩阵中估算出的可交换性矩阵,用于系统发生学分析。其中一个称为真核生物关联混合物(ELM),用于对真核生物核基因组编码的蛋白质进行系统发育分析;另一个称为真核生物与古菌关联混合物(EAL),用于重建真核生物与古菌之间的关系。这些矩阵与轮廓混合物模型相结合,与 LG 矩阵和相同的混合物模型相结合相比,能更好地拟合数据并改进拓扑估计。从 2.3.1 版开始,IQ-TREE2 允许用户在剖面混合模型下估算关联交换率(即氨基酸交换率)。
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.