The two most popular tree models used in phylogenetics are the birth-death process (BD) and the Kingman coalescent (KC). These two models differ in several respects, notably: (i) the curve of the population size through time is a stochastic process in the BD, versus a parametrized curve in the KC, (ii) the BD makes assumptions about the way samples are collected, while the KC conditions on the number of samples and the collection times, thus bypassing the need to describe the sampling procedure. These two models have been applied to different contexts: the BD in macroevolutionary studies of clades of species, and the KC for populations. The exception is the field of phylogenetic epidemiology which uses both models. This then asks the question of how such different models can be used in the same context. In this paper, we study large-population limits of the BD, in a search for a mathematical link between the BD and the KC. We show that the KC is the large-population limit of a BD conditioned on a given population trajectory, and we provide the formula for the parameter θ of the limiting KC. This formula appears in earlier studies, but the present article is the first to show formally how the correspondence arises as a large-population limit, and that the BD needs to be conditioned for the KC to arise. Besides these fundamentally mathematical results, we demonstrate how our findings can be used practically in phylogenetic inference. In particular, we propose a new method for phylogenetic epidemiology, called CalicoBird, ensuing from our results. We conjecture that this new method, used in conjunction with auxiliary data (e.g. prevalence or incidence data), should allow estimating important epidemiological parameters (e.g. the prevalence and the effective reproduction number), in a way that is robust to the data-generating model and the sampling procedure. Future studies will be needed to put our claims to the test.
{"title":"Link between the Birth-Death Process and the Kingman Coalescent-Applications to Phylogenetic Epidemiology.","authors":"Josselin Cornuault, Fabio Pardi, Celine Scornavacca","doi":"10.1093/sysbio/syaf024","DOIUrl":"10.1093/sysbio/syaf024","url":null,"abstract":"<p><p>The two most popular tree models used in phylogenetics are the birth-death process (BD) and the Kingman coalescent (KC). These two models differ in several respects, notably: (i) the curve of the population size through time is a stochastic process in the BD, versus a parametrized curve in the KC, (ii) the BD makes assumptions about the way samples are collected, while the KC conditions on the number of samples and the collection times, thus bypassing the need to describe the sampling procedure. These two models have been applied to different contexts: the BD in macroevolutionary studies of clades of species, and the KC for populations. The exception is the field of phylogenetic epidemiology which uses both models. This then asks the question of how such different models can be used in the same context. In this paper, we study large-population limits of the BD, in a search for a mathematical link between the BD and the KC. We show that the KC is the large-population limit of a BD conditioned on a given population trajectory, and we provide the formula for the parameter θ of the limiting KC. This formula appears in earlier studies, but the present article is the first to show formally how the correspondence arises as a large-population limit, and that the BD needs to be conditioned for the KC to arise. Besides these fundamentally mathematical results, we demonstrate how our findings can be used practically in phylogenetic inference. In particular, we propose a new method for phylogenetic epidemiology, called CalicoBird, ensuing from our results. We conjecture that this new method, used in conjunction with auxiliary data (e.g. prevalence or incidence data), should allow estimating important epidemiological parameters (e.g. the prevalence and the effective reproduction number), in a way that is robust to the data-generating model and the sampling procedure. Future studies will be needed to put our claims to the test.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"622-638"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144258976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phylodynamics and diversification studies using complex evolutionary models can be challenging, especially with traditional likelihood-based approaches. As an alternative, likelihood-free simulation-based approaches have been proposed due to their ability to incorporate complex models and scenarios. Here, we propose a new simulation-based deep learning (DL) method capable of selecting birth-death models and accurately estimating their parameters in both phylodynamics and diversification studies. We use a convolutional approach, where trees are encoded using the neighborhood of all nodes and leaves of the input phylogeny. We also developed a dedicated neural network architecture called PhyloCNN. Using simulations, we compared the accuracy of PhyloCNN when using a variable number of neighbors to describe the local context of nodes and leaves. The number of neighbors had a greater impact when considering smaller training sets, with a broader context showing higher accuracy, especially for complex evolutionary models. Compared to other recently developed DL approaches, PhyloCNN showed higher or similar accuracies for all parameters when used with training sets one or two orders of magnitude smaller (10,000 to 100,000 simulated training trees, instead of millions). PhyloCNN also compared favorably with state-of-the-art likelihood-based methods. We applied PhyloCNN with compelling results to two real-world phylodynamics and diversification datasets, related to HIV superspreaders in Zurich and to primates and their ecological role as seed dispersers. The high accuracy and computational efficiency of PhyloCNN opens new possibilities for phylodynamics and diversification studies that need to account for idiosyncratic phylogenetic histories with specific parameter spaces and sampling scenarios.
{"title":"PhyloCNN: Improving tree representation and neural network architecture for deep learning from trees in phylodynamics and diversification studies","authors":"Manolo Fernandez Perez, Olivier Gascuel","doi":"10.1093/sysbio/syaf082","DOIUrl":"https://doi.org/10.1093/sysbio/syaf082","url":null,"abstract":"Phylodynamics and diversification studies using complex evolutionary models can be challenging, especially with traditional likelihood-based approaches. As an alternative, likelihood-free simulation-based approaches have been proposed due to their ability to incorporate complex models and scenarios. Here, we propose a new simulation-based deep learning (DL) method capable of selecting birth-death models and accurately estimating their parameters in both phylodynamics and diversification studies. We use a convolutional approach, where trees are encoded using the neighborhood of all nodes and leaves of the input phylogeny. We also developed a dedicated neural network architecture called PhyloCNN. Using simulations, we compared the accuracy of PhyloCNN when using a variable number of neighbors to describe the local context of nodes and leaves. The number of neighbors had a greater impact when considering smaller training sets, with a broader context showing higher accuracy, especially for complex evolutionary models. Compared to other recently developed DL approaches, PhyloCNN showed higher or similar accuracies for all parameters when used with training sets one or two orders of magnitude smaller (10,000 to 100,000 simulated training trees, instead of millions). PhyloCNN also compared favorably with state-of-the-art likelihood-based methods. We applied PhyloCNN with compelling results to two real-world phylodynamics and diversification datasets, related to HIV superspreaders in Zurich and to primates and their ecological role as seed dispersers. The high accuracy and computational efficiency of PhyloCNN opens new possibilities for phylodynamics and diversification studies that need to account for idiosyncratic phylogenetic histories with specific parameter spaces and sampling scenarios.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"14 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145567391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah H D Santos, Henrique V Figueiró, Tomas Flouri, Emiliano Ramalho, Laury Cullen, Ziheng Yang, William J Murphy, Eduardo Eizirik
Phylogenomic analyses of closely related species allow important glimpses into their evolutionary history. Although recent studies have demonstrated that inter-species hybridization has occurred in several groups, incorporating this process in phylogenetic reconstruction remains challenging. Specifically, the most predominant topology across the genome is often assumed to reflect the speciation tree, but rampant hybridization might overwhelm the genomes, causing that assumption to be violated. The notoriously challenging phylogeny of the 5 extant Panthera species (specifically jaguar [P. onca], lion [P. leo], and leopard [P. pardus]) is an interesting system to address this problem. Here we employed a Panthera-wide whole-genome-sequence data set incorporating 3 jaguar genomes and 2 representatives of lions and leopards to dissect the relationships among these 3 species. Maximum-likelihood trees reconstructed from non-overlapping genomic fragments of 4 different sizes strongly supported the monophyly of all 3 species. The most frequent topology (76-95%) united lion + leopard as a sister species (topology 1), followed by lion + jaguar (topology 2: 4-8%) and leopard + jaguar (topology 3: 0-6%). Topology 1 was dominant across the genome, especially in high-recombination regions. Topologies 2 and 3 were enriched in low-recombination segments, likely reflecting the species tree in the face of hybridization. Divergence times between sister species of each topology, corrected for local-recombination-rate effects, indicated that the lion-leopard divergence was significantly younger than the alternatives, likely driven by post-speciation admixture. Introgression analyses detected pervasive hybridization between lions and leopards, regardless of the assumed species tree. This inference was strongly supported by multispecies-coalescence-with-introgression analyses, which rejected topology 1 (lion+leopard) or any model without introgression. Interestingly, topologies 2 (lion+jaguar) and 3 (jaguar+leopard) with extensive lion-leopard introgression were unidentifiable, highlighting the complexity of this phylogenetic problem. Our results suggest that the dominant genome-wide tree topology is not the true species tree but rather a consequence of overwhelming post-speciation admixture between lion and leopard.
{"title":"Massive Inter-species Introgression Overwhelms Phylogenomic Relationships Among Jaguar, Lion, and Leopard.","authors":"Sarah H D Santos, Henrique V Figueiró, Tomas Flouri, Emiliano Ramalho, Laury Cullen, Ziheng Yang, William J Murphy, Eduardo Eizirik","doi":"10.1093/sysbio/syaf021","DOIUrl":"10.1093/sysbio/syaf021","url":null,"abstract":"<p><p>Phylogenomic analyses of closely related species allow important glimpses into their evolutionary history. Although recent studies have demonstrated that inter-species hybridization has occurred in several groups, incorporating this process in phylogenetic reconstruction remains challenging. Specifically, the most predominant topology across the genome is often assumed to reflect the speciation tree, but rampant hybridization might overwhelm the genomes, causing that assumption to be violated. The notoriously challenging phylogeny of the 5 extant Panthera species (specifically jaguar [P. onca], lion [P. leo], and leopard [P. pardus]) is an interesting system to address this problem. Here we employed a Panthera-wide whole-genome-sequence data set incorporating 3 jaguar genomes and 2 representatives of lions and leopards to dissect the relationships among these 3 species. Maximum-likelihood trees reconstructed from non-overlapping genomic fragments of 4 different sizes strongly supported the monophyly of all 3 species. The most frequent topology (76-95%) united lion + leopard as a sister species (topology 1), followed by lion + jaguar (topology 2: 4-8%) and leopard + jaguar (topology 3: 0-6%). Topology 1 was dominant across the genome, especially in high-recombination regions. Topologies 2 and 3 were enriched in low-recombination segments, likely reflecting the species tree in the face of hybridization. Divergence times between sister species of each topology, corrected for local-recombination-rate effects, indicated that the lion-leopard divergence was significantly younger than the alternatives, likely driven by post-speciation admixture. Introgression analyses detected pervasive hybridization between lions and leopards, regardless of the assumed species tree. This inference was strongly supported by multispecies-coalescence-with-introgression analyses, which rejected topology 1 (lion+leopard) or any model without introgression. Interestingly, topologies 2 (lion+jaguar) and 3 (jaguar+leopard) with extensive lion-leopard introgression were unidentifiable, highlighting the complexity of this phylogenetic problem. Our results suggest that the dominant genome-wide tree topology is not the true species tree but rather a consequence of overwhelming post-speciation admixture between lion and leopard.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"583-599"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan F Gyllenhaal, Serina S Brady, Lucas H DeCicco, Alivereti Naikatini, Paul M Hime, Joseph D Manthey, John Kelly, Robert G Moyle, Michael J Andersen
Secondary contact between previously allopatric lineages offers a test of reproductive isolating mechanisms that may have accrued in isolation. Such instances of contact can produce stable hybrid zones-where reproductive isolation can further develop via reinforcement or phenotypic displacement-or result in the lineages merging. Ongoing secondary contact is most visible in continental systems, where steady input from parental taxa can occur readily. In oceanic island systems, however, secondary contact between closely related species of birds is relatively rare. When observed on sufficiently small islands, relative to population size, secondary contact likely represents a recent phenomenon. Here, we examine the dynamics of a group of birds whose apparent widespread hybridization influenced Ernst Mayr's foundational work on allopatric speciation: the whistlers of Fiji (Aves: Pachycephala). We demonstrate 2 clear instances of secondary contact within the Fijian archipelago, one resulting in a hybrid zone on a larger island, and the other resulting in a wholly admixed population on a smaller island. We leveraged low genome-wide divergence in the hybrid zone to pinpoint a single genomic region associated with observed phenotypic differences. We use genomic data to present a new hypothesis that emphasizes rapid plumage evolution and post-divergence gene flow.
{"title":"Waves of Colonization and Gene Flow in a Great Speciator.","authors":"Ethan F Gyllenhaal, Serina S Brady, Lucas H DeCicco, Alivereti Naikatini, Paul M Hime, Joseph D Manthey, John Kelly, Robert G Moyle, Michael J Andersen","doi":"10.1093/sysbio/syaf023","DOIUrl":"10.1093/sysbio/syaf023","url":null,"abstract":"<p><p>Secondary contact between previously allopatric lineages offers a test of reproductive isolating mechanisms that may have accrued in isolation. Such instances of contact can produce stable hybrid zones-where reproductive isolation can further develop via reinforcement or phenotypic displacement-or result in the lineages merging. Ongoing secondary contact is most visible in continental systems, where steady input from parental taxa can occur readily. In oceanic island systems, however, secondary contact between closely related species of birds is relatively rare. When observed on sufficiently small islands, relative to population size, secondary contact likely represents a recent phenomenon. Here, we examine the dynamics of a group of birds whose apparent widespread hybridization influenced Ernst Mayr's foundational work on allopatric speciation: the whistlers of Fiji (Aves: Pachycephala). We demonstrate 2 clear instances of secondary contact within the Fijian archipelago, one resulting in a hybrid zone on a larger island, and the other resulting in a wholly admixed population on a smaller island. We leveraged low genome-wide divergence in the hybrid zone to pinpoint a single genomic region associated with observed phenotypic differences. We use genomic data to present a new hypothesis that emphasizes rapid plumage evolution and post-divergence gene flow.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"513-525"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144038453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: A Phylogenetic Approach to Delimitate Species in a Probabilistic Way.","authors":"","doi":"10.1093/sysbio/syaf035","DOIUrl":"10.1093/sysbio/syaf035","url":null,"abstract":"","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145542523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan F Gyllenhaal,Lukas B Klicka,Lucas H DeCicco,Brian C Weeks,Robert G Moyle,Michael J Andersen
Allopatric divergence is a fundamental component of most traditional models of biogeography and community assembly. Gene flow between allopatric populations should be influenced by the nature of geographic barriers and can have a profound impact on adaptation, the speciation process, and phylogenetic inference. Superspecies-monophyletic groups of taxa with species-level differences in phenotype or genotype that are found exclusively in allopatry or parapatry-present an opportunity to characterize the effects of gene flow on the divergence process. Here we investigate patterns of gene flow, population structure, and inferred phylogenetic relationships for members of an avian superspecies, the Solomons Monarchs (Aves: Symposiachrus barbatus complex) occupying the Solomon Islands. We found that gene flow among allopatric species matches predictions based on geography, but phylogenetic relationships were not concordant with the most likely colonization history based on a stepping-stone colonization model. Notably, the most isolated island, Makira, has a species that was inferred to be sister to the taxa on all other islands in concatenated phylogenetic analyses, despite Makira being farthest from the presumed original source of immigrants. We use population genetic simulations to demonstrate that such a result could be driven by bias resulting from low levels of gene flow, reflecting a challenge in phylogeographic inference that results when one population is differentially isolated. These simulated findings demonstrate a distinguishability issue in phylogeographic inference, where gene flow and colonization history can be difficult to disentangle.
{"title":"Gene flow complicates phylogenetic inference in an archipelago radiation.","authors":"Ethan F Gyllenhaal,Lukas B Klicka,Lucas H DeCicco,Brian C Weeks,Robert G Moyle,Michael J Andersen","doi":"10.1093/sysbio/syaf081","DOIUrl":"https://doi.org/10.1093/sysbio/syaf081","url":null,"abstract":"Allopatric divergence is a fundamental component of most traditional models of biogeography and community assembly. Gene flow between allopatric populations should be influenced by the nature of geographic barriers and can have a profound impact on adaptation, the speciation process, and phylogenetic inference. Superspecies-monophyletic groups of taxa with species-level differences in phenotype or genotype that are found exclusively in allopatry or parapatry-present an opportunity to characterize the effects of gene flow on the divergence process. Here we investigate patterns of gene flow, population structure, and inferred phylogenetic relationships for members of an avian superspecies, the Solomons Monarchs (Aves: Symposiachrus barbatus complex) occupying the Solomon Islands. We found that gene flow among allopatric species matches predictions based on geography, but phylogenetic relationships were not concordant with the most likely colonization history based on a stepping-stone colonization model. Notably, the most isolated island, Makira, has a species that was inferred to be sister to the taxa on all other islands in concatenated phylogenetic analyses, despite Makira being farthest from the presumed original source of immigrants. We use population genetic simulations to demonstrate that such a result could be driven by bias resulting from low levels of gene flow, reflecting a challenge in phylogeographic inference that results when one population is differentially isolated. These simulated findings demonstrate a distinguishability issue in phylogeographic inference, where gene flow and colonization history can be difficult to disentangle.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"105 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brenen M Wynd, Basanta Khakurel, Christian F Kammerer, Peter J Wagner, April M Wright
Continuous characters have received comparatively little attention in Bayesian phylogenetic estimation. This is predominantly because they cannot be modeled by a standard phylogenetic Q-matrix approach due to their non-discrete nature. In this paper, we explore the use of continuous traits under two Brownian motion models to estimate a phylogenetic tree for Dicynodontia, a well-studied group of early synapsids (stem mammals) in which both discrete and continuous characters have been extensively used in parsimony-based tree reconstruction. We examine the differences in phylogenetic signal between a continuous trait partition, a discrete trait partition, and a joint analysis with both types of characters. We find that continuous and discrete traits contribute substantially different signal to the analysis, even when other parts of the model (clock and tree) are held constant. Tree topologies resulting from the new analyses differ strongly from the established phylogeny for dicynodonts, highlighting continued difficulty in incorporating truly continuous data in a Bayesian phylogenetic framework.
{"title":"Incorporating continuous characters in joint estimation of dicynodont phylogeny","authors":"Brenen M Wynd, Basanta Khakurel, Christian F Kammerer, Peter J Wagner, April M Wright","doi":"10.1093/sysbio/syaf078","DOIUrl":"https://doi.org/10.1093/sysbio/syaf078","url":null,"abstract":"Continuous characters have received comparatively little attention in Bayesian phylogenetic estimation. This is predominantly because they cannot be modeled by a standard phylogenetic Q-matrix approach due to their non-discrete nature. In this paper, we explore the use of continuous traits under two Brownian motion models to estimate a phylogenetic tree for Dicynodontia, a well-studied group of early synapsids (stem mammals) in which both discrete and continuous characters have been extensively used in parsimony-based tree reconstruction. We examine the differences in phylogenetic signal between a continuous trait partition, a discrete trait partition, and a joint analysis with both types of characters. We find that continuous and discrete traits contribute substantially different signal to the analysis, even when other parts of the model (clock and tree) are held constant. Tree topologies resulting from the new analyses differ strongly from the established phylogeny for dicynodonts, highlighting continued difficulty in incorporating truly continuous data in a Bayesian phylogenetic framework.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"22 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145484760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zhao, Gregory Thom, Brant C Faircloth, Michael J Andersen, F Keith Barker, Brett W Benz, Michael J Braun, Gustavo A Bravo, Robb T Brumfield, R Terry Chesser, Elizabeth P Derryberry, Travis C Glenn, Michael G Harvey, Peter A Hosner, Tyler S Imfeld, Leo Joseph, Joseph D Manthey, John E McCormack, Jenna M McCullough, Robert G Moyle, Carl H Oliveros, Noor D White Carreiro, Kevin Winker, Daniel J Field, Daniel T Ksepka, Edward L Braun, Rebecca T Kimball, Brian Tilston Smith
The exponential growth of molecular sequence data over the past decade has enabled the construction of numerous clade-specific phylogenies encompassing hundreds or thousands of taxa. These independent studies often include overlapping data, presenting a unique opportunity to build macrophylogenies (phylogenies sampling > 1,000 taxa) for entire classes across the Tree of Life. However, the inference of large trees remains constrained by logistical, computational, and methodological challenges. The Avian Tree of Life provides an ideal model for evaluating strategies to robustly infer macrophylogenies from intersecting datasets derived from smaller studies. In this study, we leveraged a comprehensive resource of sequence capture datasets to evaluate the phylogenetic accuracy and computational costs of four methodological approaches: (1) supermatrix approaches using concatenation, including the “fast” maximum likelihood (ML) methods, (2) filtering datasets to reduce heterogeneity, (3) supertree estimation based on published phylogenomic trees, and (4) a “divide-and-conquer” strategy, wherein smaller ML trees were estimated and subsequently combined using a supertree approach. Additionally, we examined the impact of these methods on divergence time estimation using a dataset that includes newly vetted fossil calibrations for the Avian Tree of Life. Our findings highlight the advantages of recently developed fast tree search approaches initiated with parsimony starting trees, which offer a reasonable compromise between computational efficiency and phylogenetic accuracy, facilitating inference of macrophylogenies.
{"title":"Efficient Inference of Macrophylogenies: Insights from the Avian Tree of Life","authors":"Min Zhao, Gregory Thom, Brant C Faircloth, Michael J Andersen, F Keith Barker, Brett W Benz, Michael J Braun, Gustavo A Bravo, Robb T Brumfield, R Terry Chesser, Elizabeth P Derryberry, Travis C Glenn, Michael G Harvey, Peter A Hosner, Tyler S Imfeld, Leo Joseph, Joseph D Manthey, John E McCormack, Jenna M McCullough, Robert G Moyle, Carl H Oliveros, Noor D White Carreiro, Kevin Winker, Daniel J Field, Daniel T Ksepka, Edward L Braun, Rebecca T Kimball, Brian Tilston Smith","doi":"10.1093/sysbio/syaf080","DOIUrl":"https://doi.org/10.1093/sysbio/syaf080","url":null,"abstract":"The exponential growth of molecular sequence data over the past decade has enabled the construction of numerous clade-specific phylogenies encompassing hundreds or thousands of taxa. These independent studies often include overlapping data, presenting a unique opportunity to build macrophylogenies (phylogenies sampling &gt; 1,000 taxa) for entire classes across the Tree of Life. However, the inference of large trees remains constrained by logistical, computational, and methodological challenges. The Avian Tree of Life provides an ideal model for evaluating strategies to robustly infer macrophylogenies from intersecting datasets derived from smaller studies. In this study, we leveraged a comprehensive resource of sequence capture datasets to evaluate the phylogenetic accuracy and computational costs of four methodological approaches: (1) supermatrix approaches using concatenation, including the “fast” maximum likelihood (ML) methods, (2) filtering datasets to reduce heterogeneity, (3) supertree estimation based on published phylogenomic trees, and (4) a “divide-and-conquer” strategy, wherein smaller ML trees were estimated and subsequently combined using a supertree approach. Additionally, we examined the impact of these methods on divergence time estimation using a dataset that includes newly vetted fossil calibrations for the Avian Tree of Life. Our findings highlight the advantages of recently developed fast tree search approaches initiated with parsimony starting trees, which offer a reasonable compromise between computational efficiency and phylogenetic accuracy, facilitating inference of macrophylogenies.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"216 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
What does “systematic biology” mean today, where has it been in the past, and where is it going? We explore these questions by considering five elements – collaboration, integration, discourse, infrastructure, and society – that we think have allowed systematic biology to adapt to change and sustain growth without losing its unique identity. In the spirit of celebrating the 75th anniversary of our flagship journal, we generated a comprehensive dataset for all Systematic Biology and Systematic Zoology articles that we could locate (N = 5,150) and used bibliometric and textual analyses to illustrate ways in which our field has transformed over time. We offer our humble opinions on how our community can ensure that systematic biologists inherit an enlightening, dynamic, and enduring future.
{"title":"Seventy-five Years of Systematic Biology: Looking Back, Moving Forward","authors":"Michael J Landis, Michael J Donoghue","doi":"10.1093/sysbio/syaf079","DOIUrl":"https://doi.org/10.1093/sysbio/syaf079","url":null,"abstract":"What does “systematic biology” mean today, where has it been in the past, and where is it going? We explore these questions by considering five elements – collaboration, integration, discourse, infrastructure, and society – that we think have allowed systematic biology to adapt to change and sustain growth without losing its unique identity. In the spirit of celebrating the 75th anniversary of our flagship journal, we generated a comprehensive dataset for all Systematic Biology and Systematic Zoology articles that we could locate (N = 5,150) and used bibliometric and textual analyses to illustrate ways in which our field has transformed over time. We offer our humble opinions on how our community can ensure that systematic biologists inherit an enlightening, dynamic, and enduring future.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"53 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas J Musher,Therese A Catanach,Thomas Valqui,Robb T Brumfield,Alexandre Aleixo,Kevin P Johnson,Jason D Weckstein
As an old group that has diversified in South America over millions of years, the tinamous (Palaeognathae: Tinamidae) are of high interest for understanding the evolution of birds and the assembly of the Neotropical biota. However, there are currently no complete species-level phylogenies of this group. Most prior work has been based on either morphological data or a small number of molecular markers, each of which has limited capability for reconstructing the tinamou phylogeny. Therefore, the interrelationships of most tinamou species are uncertain. We analyzed 80 whole-genomes from a mix of historical study skins and frozen tissues, including all 46 recognized species of tinamous to (1) reconstruct their interrelationships, (2) estimate the timeframe of tinamou evolution, and (3) examine for the effects of incomplete lineage sorting (ILS) and ancestral introgression on genome evolution. We compared results for coding (BUSCO) and ultraconserved element (UCE) loci, as well as sex-linked and autosomal markers, and used fossil-calibrated tip-dating to estimate divergence times. Tinamous diverged from their sister-group, the extinct Moas, 50-60 mya, and their crown divergence occurred roughly 30-40 mya, followed by constant diversification rates until the present. Phylogenetic reconstructions were largely robust across methods and datasets. Only one clade in the genus Crypturellus displayed substantial species-tree discordance across the different data sets. To investigate the impacts of introgression on this discordance, we quantified introgression for 100kb non-overlapping windows across the genome, and identified pervasive genome-wide introgression. The distribution of this introgression across the genome was dependent on the assumed phylogeny applied to the f-branch model. When assuming one of these topologies in the f-branch model, patterns of introgression matched theoretical predictions about genome architecture. Overall, we present the most complete phylogeny for tinamous to date, identify an unrecognized species, and provide a case study for species-level phylogenomic analysis using whole-genomes.
{"title":"Whole-genomes illuminate the drivers of gene tree discordance and the tempo of tinamou diversification (Aves: Tinamidae).","authors":"Lukas J Musher,Therese A Catanach,Thomas Valqui,Robb T Brumfield,Alexandre Aleixo,Kevin P Johnson,Jason D Weckstein","doi":"10.1093/sysbio/syaf077","DOIUrl":"https://doi.org/10.1093/sysbio/syaf077","url":null,"abstract":"As an old group that has diversified in South America over millions of years, the tinamous (Palaeognathae: Tinamidae) are of high interest for understanding the evolution of birds and the assembly of the Neotropical biota. However, there are currently no complete species-level phylogenies of this group. Most prior work has been based on either morphological data or a small number of molecular markers, each of which has limited capability for reconstructing the tinamou phylogeny. Therefore, the interrelationships of most tinamou species are uncertain. We analyzed 80 whole-genomes from a mix of historical study skins and frozen tissues, including all 46 recognized species of tinamous to (1) reconstruct their interrelationships, (2) estimate the timeframe of tinamou evolution, and (3) examine for the effects of incomplete lineage sorting (ILS) and ancestral introgression on genome evolution. We compared results for coding (BUSCO) and ultraconserved element (UCE) loci, as well as sex-linked and autosomal markers, and used fossil-calibrated tip-dating to estimate divergence times. Tinamous diverged from their sister-group, the extinct Moas, 50-60 mya, and their crown divergence occurred roughly 30-40 mya, followed by constant diversification rates until the present. Phylogenetic reconstructions were largely robust across methods and datasets. Only one clade in the genus Crypturellus displayed substantial species-tree discordance across the different data sets. To investigate the impacts of introgression on this discordance, we quantified introgression for 100kb non-overlapping windows across the genome, and identified pervasive genome-wide introgression. The distribution of this introgression across the genome was dependent on the assumed phylogeny applied to the f-branch model. When assuming one of these topologies in the f-branch model, patterns of introgression matched theoretical predictions about genome architecture. Overall, we present the most complete phylogeny for tinamous to date, identify an unrecognized species, and provide a case study for species-level phylogenomic analysis using whole-genomes.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"37 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}