Daniel J MacGuigan, Adam Taylor, Ava Ghezelayagh, Julia E Wood, Jeffrey W Simmons, Jon M Mollish, Thomas J Near
Biologists have relied on morphological characteristics to identify, define, and formally describe species for the past 250 years. The advent of phylogenetic species concepts and the introduction of molecular data have spawned new species delimitation methods applicable to a wide range of eukaryotic lineages. However, these approaches heavily emphasize genomic data, often overlooking phenotypic traits. We present and implement a species delimitation approach that utilizes genome-wide markers from ddRAD-seq and meristic morphological traits, which have long been used to identify and delineate fish species. Our methodology employs unsupervised machine learning to analyze morphological data without a priori species assignments, allowing phenotypic patterns to emerge independently from genomic-based species delimitation. We apply our combined genomic and phenotypic methodology to the freshwater systems of Southeastern North America, a biodiversity hotspot where conservation efforts are hampered by an incomplete knowledge of species diversity. Our investigation focuses on the darter clade Allohistium, a threatened lineage comprising two described species. Through phylogenomic, population genetic, and phenotypic model comparisons, we provide evidence supporting the delimitation of a third species of Allohistium, which we formally describe. Our approach shows how unsupervised machine learning can reveal cryptic morphological diversity that might otherwise be obscured by taxonomic preconceptions. This study demonstrates that model testing using diverse lines of evidence yields a more comprehensive, data-driven hypothesis of species diversity.
{"title":"Genomic and phenotypic delimitation of species in a temperate aquatic biodiversity hotspot","authors":"Daniel J MacGuigan, Adam Taylor, Ava Ghezelayagh, Julia E Wood, Jeffrey W Simmons, Jon M Mollish, Thomas J Near","doi":"10.1093/sysbio/syaf083","DOIUrl":"https://doi.org/10.1093/sysbio/syaf083","url":null,"abstract":"Biologists have relied on morphological characteristics to identify, define, and formally describe species for the past 250 years. The advent of phylogenetic species concepts and the introduction of molecular data have spawned new species delimitation methods applicable to a wide range of eukaryotic lineages. However, these approaches heavily emphasize genomic data, often overlooking phenotypic traits. We present and implement a species delimitation approach that utilizes genome-wide markers from ddRAD-seq and meristic morphological traits, which have long been used to identify and delineate fish species. Our methodology employs unsupervised machine learning to analyze morphological data without a priori species assignments, allowing phenotypic patterns to emerge independently from genomic-based species delimitation. We apply our combined genomic and phenotypic methodology to the freshwater systems of Southeastern North America, a biodiversity hotspot where conservation efforts are hampered by an incomplete knowledge of species diversity. Our investigation focuses on the darter clade Allohistium, a threatened lineage comprising two described species. Through phylogenomic, population genetic, and phenotypic model comparisons, we provide evidence supporting the delimitation of a third species of Allohistium, which we formally describe. Our approach shows how unsupervised machine learning can reveal cryptic morphological diversity that might otherwise be obscured by taxonomic preconceptions. This study demonstrates that model testing using diverse lines of evidence yields a more comprehensive, data-driven hypothesis of species diversity.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"29 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145609193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Obtaining a timescale for bacterial evolution is crucial to understand early life evolution but is difficult owing to the scarcity of bacterial fossils. Here, we introduce multiple new time constraints to calibrate bacterial evolution based on ancient symbiosis. This idea is implemented using a bacterial tree constructed with genes found in the mitochondrial lineages phylogenetically embedded within Proteobacteria. The expanded mitochondria-bacterial tree allows the node age constraints of eukaryotes established by their abundant fossils to be propagated to ancient co-evolving bacterial symbionts and across the bacterial tree of life. Importantly, we formulate a new probabilistic framework that considers uncertainty in inference of the ancestral lifestyle of modern symbionts to apply 19 relative time constraints each informed by host-symbiont association to constrain bacterial symbionts no older than their eukaryotic host. Moreover, we develop an approach to incorporating substitution mixture models that better accommodate substitutional saturation and compositional heterogeneity for dating deep phylogenies. Our analysis estimates that the last bacterial common ancestor occurred approximately 4.0-3.5 billion years ago (Ga), followed by rapid divergence of major bacterial clades. It is generally robust to alternative root ages, root positions, tree topologies, fossil ages, ancestral lifestyle reconstruction, gene sets, among other factors. The obtained timetree serves as a foundation for testing hypotheses regarding bacterial diversification and its correlation with geobiological events across different timescales.
{"title":"Dating the Bacterial Tree of Life Based on Ancient Symbiosis.","authors":"Sishuo Wang, Haiwei Luo","doi":"10.1093/sysbio/syae071","DOIUrl":"10.1093/sysbio/syae071","url":null,"abstract":"<p><p>Obtaining a timescale for bacterial evolution is crucial to understand early life evolution but is difficult owing to the scarcity of bacterial fossils. Here, we introduce multiple new time constraints to calibrate bacterial evolution based on ancient symbiosis. This idea is implemented using a bacterial tree constructed with genes found in the mitochondrial lineages phylogenetically embedded within Proteobacteria. The expanded mitochondria-bacterial tree allows the node age constraints of eukaryotes established by their abundant fossils to be propagated to ancient co-evolving bacterial symbionts and across the bacterial tree of life. Importantly, we formulate a new probabilistic framework that considers uncertainty in inference of the ancestral lifestyle of modern symbionts to apply 19 relative time constraints each informed by host-symbiont association to constrain bacterial symbionts no older than their eukaryotic host. Moreover, we develop an approach to incorporating substitution mixture models that better accommodate substitutional saturation and compositional heterogeneity for dating deep phylogenies. Our analysis estimates that the last bacterial common ancestor occurred approximately 4.0-3.5 billion years ago (Ga), followed by rapid divergence of major bacterial clades. It is generally robust to alternative root ages, root positions, tree topologies, fossil ages, ancestral lifestyle reconstruction, gene sets, among other factors. The obtained timetree serves as a foundation for testing hypotheses regarding bacterial diversification and its correlation with geobiological events across different timescales.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"639-655"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The two most popular tree models used in phylogenetics are the birth-death process (BD) and the Kingman coalescent (KC). These two models differ in several respects, notably: (i) the curve of the population size through time is a stochastic process in the BD, versus a parametrized curve in the KC, (ii) the BD makes assumptions about the way samples are collected, while the KC conditions on the number of samples and the collection times, thus bypassing the need to describe the sampling procedure. These two models have been applied to different contexts: the BD in macroevolutionary studies of clades of species, and the KC for populations. The exception is the field of phylogenetic epidemiology which uses both models. This then asks the question of how such different models can be used in the same context. In this paper, we study large-population limits of the BD, in a search for a mathematical link between the BD and the KC. We show that the KC is the large-population limit of a BD conditioned on a given population trajectory, and we provide the formula for the parameter θ of the limiting KC. This formula appears in earlier studies, but the present article is the first to show formally how the correspondence arises as a large-population limit, and that the BD needs to be conditioned for the KC to arise. Besides these fundamentally mathematical results, we demonstrate how our findings can be used practically in phylogenetic inference. In particular, we propose a new method for phylogenetic epidemiology, called CalicoBird, ensuing from our results. We conjecture that this new method, used in conjunction with auxiliary data (e.g. prevalence or incidence data), should allow estimating important epidemiological parameters (e.g. the prevalence and the effective reproduction number), in a way that is robust to the data-generating model and the sampling procedure. Future studies will be needed to put our claims to the test.
{"title":"Link between the Birth-Death Process and the Kingman Coalescent-Applications to Phylogenetic Epidemiology.","authors":"Josselin Cornuault, Fabio Pardi, Celine Scornavacca","doi":"10.1093/sysbio/syaf024","DOIUrl":"10.1093/sysbio/syaf024","url":null,"abstract":"<p><p>The two most popular tree models used in phylogenetics are the birth-death process (BD) and the Kingman coalescent (KC). These two models differ in several respects, notably: (i) the curve of the population size through time is a stochastic process in the BD, versus a parametrized curve in the KC, (ii) the BD makes assumptions about the way samples are collected, while the KC conditions on the number of samples and the collection times, thus bypassing the need to describe the sampling procedure. These two models have been applied to different contexts: the BD in macroevolutionary studies of clades of species, and the KC for populations. The exception is the field of phylogenetic epidemiology which uses both models. This then asks the question of how such different models can be used in the same context. In this paper, we study large-population limits of the BD, in a search for a mathematical link between the BD and the KC. We show that the KC is the large-population limit of a BD conditioned on a given population trajectory, and we provide the formula for the parameter θ of the limiting KC. This formula appears in earlier studies, but the present article is the first to show formally how the correspondence arises as a large-population limit, and that the BD needs to be conditioned for the KC to arise. Besides these fundamentally mathematical results, we demonstrate how our findings can be used practically in phylogenetic inference. In particular, we propose a new method for phylogenetic epidemiology, called CalicoBird, ensuing from our results. We conjecture that this new method, used in conjunction with auxiliary data (e.g. prevalence or incidence data), should allow estimating important epidemiological parameters (e.g. the prevalence and the effective reproduction number), in a way that is robust to the data-generating model and the sampling procedure. Future studies will be needed to put our claims to the test.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"622-638"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144258976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phylodynamics and diversification studies using complex evolutionary models can be challenging, especially with traditional likelihood-based approaches. As an alternative, likelihood-free simulation-based approaches have been proposed due to their ability to incorporate complex models and scenarios. Here, we propose a new simulation-based deep learning (DL) method capable of selecting birth-death models and accurately estimating their parameters in both phylodynamics and diversification studies. We use a convolutional approach, where trees are encoded using the neighborhood of all nodes and leaves of the input phylogeny. We also developed a dedicated neural network architecture called PhyloCNN. Using simulations, we compared the accuracy of PhyloCNN when using a variable number of neighbors to describe the local context of nodes and leaves. The number of neighbors had a greater impact when considering smaller training sets, with a broader context showing higher accuracy, especially for complex evolutionary models. Compared to other recently developed DL approaches, PhyloCNN showed higher or similar accuracies for all parameters when used with training sets one or two orders of magnitude smaller (10,000 to 100,000 simulated training trees, instead of millions). PhyloCNN also compared favorably with state-of-the-art likelihood-based methods. We applied PhyloCNN with compelling results to two real-world phylodynamics and diversification datasets, related to HIV superspreaders in Zurich and to primates and their ecological role as seed dispersers. The high accuracy and computational efficiency of PhyloCNN opens new possibilities for phylodynamics and diversification studies that need to account for idiosyncratic phylogenetic histories with specific parameter spaces and sampling scenarios.
{"title":"PhyloCNN: Improving tree representation and neural network architecture for deep learning from trees in phylodynamics and diversification studies","authors":"Manolo Fernandez Perez, Olivier Gascuel","doi":"10.1093/sysbio/syaf082","DOIUrl":"https://doi.org/10.1093/sysbio/syaf082","url":null,"abstract":"Phylodynamics and diversification studies using complex evolutionary models can be challenging, especially with traditional likelihood-based approaches. As an alternative, likelihood-free simulation-based approaches have been proposed due to their ability to incorporate complex models and scenarios. Here, we propose a new simulation-based deep learning (DL) method capable of selecting birth-death models and accurately estimating their parameters in both phylodynamics and diversification studies. We use a convolutional approach, where trees are encoded using the neighborhood of all nodes and leaves of the input phylogeny. We also developed a dedicated neural network architecture called PhyloCNN. Using simulations, we compared the accuracy of PhyloCNN when using a variable number of neighbors to describe the local context of nodes and leaves. The number of neighbors had a greater impact when considering smaller training sets, with a broader context showing higher accuracy, especially for complex evolutionary models. Compared to other recently developed DL approaches, PhyloCNN showed higher or similar accuracies for all parameters when used with training sets one or two orders of magnitude smaller (10,000 to 100,000 simulated training trees, instead of millions). PhyloCNN also compared favorably with state-of-the-art likelihood-based methods. We applied PhyloCNN with compelling results to two real-world phylodynamics and diversification datasets, related to HIV superspreaders in Zurich and to primates and their ecological role as seed dispersers. The high accuracy and computational efficiency of PhyloCNN opens new possibilities for phylodynamics and diversification studies that need to account for idiosyncratic phylogenetic histories with specific parameter spaces and sampling scenarios.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"14 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145567391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah H D Santos, Henrique V Figueiró, Tomas Flouri, Emiliano Ramalho, Laury Cullen, Ziheng Yang, William J Murphy, Eduardo Eizirik
Phylogenomic analyses of closely related species allow important glimpses into their evolutionary history. Although recent studies have demonstrated that inter-species hybridization has occurred in several groups, incorporating this process in phylogenetic reconstruction remains challenging. Specifically, the most predominant topology across the genome is often assumed to reflect the speciation tree, but rampant hybridization might overwhelm the genomes, causing that assumption to be violated. The notoriously challenging phylogeny of the 5 extant Panthera species (specifically jaguar [P. onca], lion [P. leo], and leopard [P. pardus]) is an interesting system to address this problem. Here we employed a Panthera-wide whole-genome-sequence data set incorporating 3 jaguar genomes and 2 representatives of lions and leopards to dissect the relationships among these 3 species. Maximum-likelihood trees reconstructed from non-overlapping genomic fragments of 4 different sizes strongly supported the monophyly of all 3 species. The most frequent topology (76-95%) united lion + leopard as a sister species (topology 1), followed by lion + jaguar (topology 2: 4-8%) and leopard + jaguar (topology 3: 0-6%). Topology 1 was dominant across the genome, especially in high-recombination regions. Topologies 2 and 3 were enriched in low-recombination segments, likely reflecting the species tree in the face of hybridization. Divergence times between sister species of each topology, corrected for local-recombination-rate effects, indicated that the lion-leopard divergence was significantly younger than the alternatives, likely driven by post-speciation admixture. Introgression analyses detected pervasive hybridization between lions and leopards, regardless of the assumed species tree. This inference was strongly supported by multispecies-coalescence-with-introgression analyses, which rejected topology 1 (lion+leopard) or any model without introgression. Interestingly, topologies 2 (lion+jaguar) and 3 (jaguar+leopard) with extensive lion-leopard introgression were unidentifiable, highlighting the complexity of this phylogenetic problem. Our results suggest that the dominant genome-wide tree topology is not the true species tree but rather a consequence of overwhelming post-speciation admixture between lion and leopard.
{"title":"Massive Inter-species Introgression Overwhelms Phylogenomic Relationships Among Jaguar, Lion, and Leopard.","authors":"Sarah H D Santos, Henrique V Figueiró, Tomas Flouri, Emiliano Ramalho, Laury Cullen, Ziheng Yang, William J Murphy, Eduardo Eizirik","doi":"10.1093/sysbio/syaf021","DOIUrl":"10.1093/sysbio/syaf021","url":null,"abstract":"<p><p>Phylogenomic analyses of closely related species allow important glimpses into their evolutionary history. Although recent studies have demonstrated that inter-species hybridization has occurred in several groups, incorporating this process in phylogenetic reconstruction remains challenging. Specifically, the most predominant topology across the genome is often assumed to reflect the speciation tree, but rampant hybridization might overwhelm the genomes, causing that assumption to be violated. The notoriously challenging phylogeny of the 5 extant Panthera species (specifically jaguar [P. onca], lion [P. leo], and leopard [P. pardus]) is an interesting system to address this problem. Here we employed a Panthera-wide whole-genome-sequence data set incorporating 3 jaguar genomes and 2 representatives of lions and leopards to dissect the relationships among these 3 species. Maximum-likelihood trees reconstructed from non-overlapping genomic fragments of 4 different sizes strongly supported the monophyly of all 3 species. The most frequent topology (76-95%) united lion + leopard as a sister species (topology 1), followed by lion + jaguar (topology 2: 4-8%) and leopard + jaguar (topology 3: 0-6%). Topology 1 was dominant across the genome, especially in high-recombination regions. Topologies 2 and 3 were enriched in low-recombination segments, likely reflecting the species tree in the face of hybridization. Divergence times between sister species of each topology, corrected for local-recombination-rate effects, indicated that the lion-leopard divergence was significantly younger than the alternatives, likely driven by post-speciation admixture. Introgression analyses detected pervasive hybridization between lions and leopards, regardless of the assumed species tree. This inference was strongly supported by multispecies-coalescence-with-introgression analyses, which rejected topology 1 (lion+leopard) or any model without introgression. Interestingly, topologies 2 (lion+jaguar) and 3 (jaguar+leopard) with extensive lion-leopard introgression were unidentifiable, highlighting the complexity of this phylogenetic problem. Our results suggest that the dominant genome-wide tree topology is not the true species tree but rather a consequence of overwhelming post-speciation admixture between lion and leopard.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"583-599"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan F Gyllenhaal, Serina S Brady, Lucas H DeCicco, Alivereti Naikatini, Paul M Hime, Joseph D Manthey, John Kelly, Robert G Moyle, Michael J Andersen
Secondary contact between previously allopatric lineages offers a test of reproductive isolating mechanisms that may have accrued in isolation. Such instances of contact can produce stable hybrid zones-where reproductive isolation can further develop via reinforcement or phenotypic displacement-or result in the lineages merging. Ongoing secondary contact is most visible in continental systems, where steady input from parental taxa can occur readily. In oceanic island systems, however, secondary contact between closely related species of birds is relatively rare. When observed on sufficiently small islands, relative to population size, secondary contact likely represents a recent phenomenon. Here, we examine the dynamics of a group of birds whose apparent widespread hybridization influenced Ernst Mayr's foundational work on allopatric speciation: the whistlers of Fiji (Aves: Pachycephala). We demonstrate 2 clear instances of secondary contact within the Fijian archipelago, one resulting in a hybrid zone on a larger island, and the other resulting in a wholly admixed population on a smaller island. We leveraged low genome-wide divergence in the hybrid zone to pinpoint a single genomic region associated with observed phenotypic differences. We use genomic data to present a new hypothesis that emphasizes rapid plumage evolution and post-divergence gene flow.
{"title":"Waves of Colonization and Gene Flow in a Great Speciator.","authors":"Ethan F Gyllenhaal, Serina S Brady, Lucas H DeCicco, Alivereti Naikatini, Paul M Hime, Joseph D Manthey, John Kelly, Robert G Moyle, Michael J Andersen","doi":"10.1093/sysbio/syaf023","DOIUrl":"10.1093/sysbio/syaf023","url":null,"abstract":"<p><p>Secondary contact between previously allopatric lineages offers a test of reproductive isolating mechanisms that may have accrued in isolation. Such instances of contact can produce stable hybrid zones-where reproductive isolation can further develop via reinforcement or phenotypic displacement-or result in the lineages merging. Ongoing secondary contact is most visible in continental systems, where steady input from parental taxa can occur readily. In oceanic island systems, however, secondary contact between closely related species of birds is relatively rare. When observed on sufficiently small islands, relative to population size, secondary contact likely represents a recent phenomenon. Here, we examine the dynamics of a group of birds whose apparent widespread hybridization influenced Ernst Mayr's foundational work on allopatric speciation: the whistlers of Fiji (Aves: Pachycephala). We demonstrate 2 clear instances of secondary contact within the Fijian archipelago, one resulting in a hybrid zone on a larger island, and the other resulting in a wholly admixed population on a smaller island. We leveraged low genome-wide divergence in the hybrid zone to pinpoint a single genomic region associated with observed phenotypic differences. We use genomic data to present a new hypothesis that emphasizes rapid plumage evolution and post-divergence gene flow.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"513-525"},"PeriodicalIF":5.7,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144038453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: A Phylogenetic Approach to Delimitate Species in a Probabilistic Way.","authors":"","doi":"10.1093/sysbio/syaf035","DOIUrl":"10.1093/sysbio/syaf035","url":null,"abstract":"","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145542523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan F Gyllenhaal,Lukas B Klicka,Lucas H DeCicco,Brian C Weeks,Robert G Moyle,Michael J Andersen
Allopatric divergence is a fundamental component of most traditional models of biogeography and community assembly. Gene flow between allopatric populations should be influenced by the nature of geographic barriers and can have a profound impact on adaptation, the speciation process, and phylogenetic inference. Superspecies-monophyletic groups of taxa with species-level differences in phenotype or genotype that are found exclusively in allopatry or parapatry-present an opportunity to characterize the effects of gene flow on the divergence process. Here we investigate patterns of gene flow, population structure, and inferred phylogenetic relationships for members of an avian superspecies, the Solomons Monarchs (Aves: Symposiachrus barbatus complex) occupying the Solomon Islands. We found that gene flow among allopatric species matches predictions based on geography, but phylogenetic relationships were not concordant with the most likely colonization history based on a stepping-stone colonization model. Notably, the most isolated island, Makira, has a species that was inferred to be sister to the taxa on all other islands in concatenated phylogenetic analyses, despite Makira being farthest from the presumed original source of immigrants. We use population genetic simulations to demonstrate that such a result could be driven by bias resulting from low levels of gene flow, reflecting a challenge in phylogeographic inference that results when one population is differentially isolated. These simulated findings demonstrate a distinguishability issue in phylogeographic inference, where gene flow and colonization history can be difficult to disentangle.
{"title":"Gene flow complicates phylogenetic inference in an archipelago radiation.","authors":"Ethan F Gyllenhaal,Lukas B Klicka,Lucas H DeCicco,Brian C Weeks,Robert G Moyle,Michael J Andersen","doi":"10.1093/sysbio/syaf081","DOIUrl":"https://doi.org/10.1093/sysbio/syaf081","url":null,"abstract":"Allopatric divergence is a fundamental component of most traditional models of biogeography and community assembly. Gene flow between allopatric populations should be influenced by the nature of geographic barriers and can have a profound impact on adaptation, the speciation process, and phylogenetic inference. Superspecies-monophyletic groups of taxa with species-level differences in phenotype or genotype that are found exclusively in allopatry or parapatry-present an opportunity to characterize the effects of gene flow on the divergence process. Here we investigate patterns of gene flow, population structure, and inferred phylogenetic relationships for members of an avian superspecies, the Solomons Monarchs (Aves: Symposiachrus barbatus complex) occupying the Solomon Islands. We found that gene flow among allopatric species matches predictions based on geography, but phylogenetic relationships were not concordant with the most likely colonization history based on a stepping-stone colonization model. Notably, the most isolated island, Makira, has a species that was inferred to be sister to the taxa on all other islands in concatenated phylogenetic analyses, despite Makira being farthest from the presumed original source of immigrants. We use population genetic simulations to demonstrate that such a result could be driven by bias resulting from low levels of gene flow, reflecting a challenge in phylogeographic inference that results when one population is differentially isolated. These simulated findings demonstrate a distinguishability issue in phylogeographic inference, where gene flow and colonization history can be difficult to disentangle.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"105 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brenen M Wynd, Basanta Khakurel, Christian F Kammerer, Peter J Wagner, April M Wright
Continuous characters have received comparatively little attention in Bayesian phylogenetic estimation. This is predominantly because they cannot be modeled by a standard phylogenetic Q-matrix approach due to their non-discrete nature. In this paper, we explore the use of continuous traits under two Brownian motion models to estimate a phylogenetic tree for Dicynodontia, a well-studied group of early synapsids (stem mammals) in which both discrete and continuous characters have been extensively used in parsimony-based tree reconstruction. We examine the differences in phylogenetic signal between a continuous trait partition, a discrete trait partition, and a joint analysis with both types of characters. We find that continuous and discrete traits contribute substantially different signal to the analysis, even when other parts of the model (clock and tree) are held constant. Tree topologies resulting from the new analyses differ strongly from the established phylogeny for dicynodonts, highlighting continued difficulty in incorporating truly continuous data in a Bayesian phylogenetic framework.
{"title":"Incorporating continuous characters in joint estimation of dicynodont phylogeny","authors":"Brenen M Wynd, Basanta Khakurel, Christian F Kammerer, Peter J Wagner, April M Wright","doi":"10.1093/sysbio/syaf078","DOIUrl":"https://doi.org/10.1093/sysbio/syaf078","url":null,"abstract":"Continuous characters have received comparatively little attention in Bayesian phylogenetic estimation. This is predominantly because they cannot be modeled by a standard phylogenetic Q-matrix approach due to their non-discrete nature. In this paper, we explore the use of continuous traits under two Brownian motion models to estimate a phylogenetic tree for Dicynodontia, a well-studied group of early synapsids (stem mammals) in which both discrete and continuous characters have been extensively used in parsimony-based tree reconstruction. We examine the differences in phylogenetic signal between a continuous trait partition, a discrete trait partition, and a joint analysis with both types of characters. We find that continuous and discrete traits contribute substantially different signal to the analysis, even when other parts of the model (clock and tree) are held constant. Tree topologies resulting from the new analyses differ strongly from the established phylogeny for dicynodonts, highlighting continued difficulty in incorporating truly continuous data in a Bayesian phylogenetic framework.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"22 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145484760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zhao, Gregory Thom, Brant C Faircloth, Michael J Andersen, F Keith Barker, Brett W Benz, Michael J Braun, Gustavo A Bravo, Robb T Brumfield, R Terry Chesser, Elizabeth P Derryberry, Travis C Glenn, Michael G Harvey, Peter A Hosner, Tyler S Imfeld, Leo Joseph, Joseph D Manthey, John E McCormack, Jenna M McCullough, Robert G Moyle, Carl H Oliveros, Noor D White Carreiro, Kevin Winker, Daniel J Field, Daniel T Ksepka, Edward L Braun, Rebecca T Kimball, Brian Tilston Smith
The exponential growth of molecular sequence data over the past decade has enabled the construction of numerous clade-specific phylogenies encompassing hundreds or thousands of taxa. These independent studies often include overlapping data, presenting a unique opportunity to build macrophylogenies (phylogenies sampling > 1,000 taxa) for entire classes across the Tree of Life. However, the inference of large trees remains constrained by logistical, computational, and methodological challenges. The Avian Tree of Life provides an ideal model for evaluating strategies to robustly infer macrophylogenies from intersecting datasets derived from smaller studies. In this study, we leveraged a comprehensive resource of sequence capture datasets to evaluate the phylogenetic accuracy and computational costs of four methodological approaches: (1) supermatrix approaches using concatenation, including the “fast” maximum likelihood (ML) methods, (2) filtering datasets to reduce heterogeneity, (3) supertree estimation based on published phylogenomic trees, and (4) a “divide-and-conquer” strategy, wherein smaller ML trees were estimated and subsequently combined using a supertree approach. Additionally, we examined the impact of these methods on divergence time estimation using a dataset that includes newly vetted fossil calibrations for the Avian Tree of Life. Our findings highlight the advantages of recently developed fast tree search approaches initiated with parsimony starting trees, which offer a reasonable compromise between computational efficiency and phylogenetic accuracy, facilitating inference of macrophylogenies.
{"title":"Efficient Inference of Macrophylogenies: Insights from the Avian Tree of Life","authors":"Min Zhao, Gregory Thom, Brant C Faircloth, Michael J Andersen, F Keith Barker, Brett W Benz, Michael J Braun, Gustavo A Bravo, Robb T Brumfield, R Terry Chesser, Elizabeth P Derryberry, Travis C Glenn, Michael G Harvey, Peter A Hosner, Tyler S Imfeld, Leo Joseph, Joseph D Manthey, John E McCormack, Jenna M McCullough, Robert G Moyle, Carl H Oliveros, Noor D White Carreiro, Kevin Winker, Daniel J Field, Daniel T Ksepka, Edward L Braun, Rebecca T Kimball, Brian Tilston Smith","doi":"10.1093/sysbio/syaf080","DOIUrl":"https://doi.org/10.1093/sysbio/syaf080","url":null,"abstract":"The exponential growth of molecular sequence data over the past decade has enabled the construction of numerous clade-specific phylogenies encompassing hundreds or thousands of taxa. These independent studies often include overlapping data, presenting a unique opportunity to build macrophylogenies (phylogenies sampling &gt; 1,000 taxa) for entire classes across the Tree of Life. However, the inference of large trees remains constrained by logistical, computational, and methodological challenges. The Avian Tree of Life provides an ideal model for evaluating strategies to robustly infer macrophylogenies from intersecting datasets derived from smaller studies. In this study, we leveraged a comprehensive resource of sequence capture datasets to evaluate the phylogenetic accuracy and computational costs of four methodological approaches: (1) supermatrix approaches using concatenation, including the “fast” maximum likelihood (ML) methods, (2) filtering datasets to reduce heterogeneity, (3) supertree estimation based on published phylogenomic trees, and (4) a “divide-and-conquer” strategy, wherein smaller ML trees were estimated and subsequently combined using a supertree approach. Additionally, we examined the impact of these methods on divergence time estimation using a dataset that includes newly vetted fossil calibrations for the Avian Tree of Life. Our findings highlight the advantages of recently developed fast tree search approaches initiated with parsimony starting trees, which offer a reasonable compromise between computational efficiency and phylogenetic accuracy, facilitating inference of macrophylogenies.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"216 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}