African antelope diversity is a globally unique vestige of a much richer world-wide Pleistocene megafauna. Despite this, the evolutionary processes leading to the prolific radiation of African antelopes are not well understood. Here, we sequenced 145 whole genomes from both subspecies of the waterbuck (Kobus ellipsiprymnus), an African antelope believed to be in the process of speciation. We investigated genetic structure and population divergence and found evidence of a mid-Pleistocene separation on either side of the eastern Great Rift Valley, consistent with vicariance caused by a rain shadow along the so-called "Kingdon's Line." However, we also found pervasive evidence of both recent and widespread historical gene flow across the Rift Valley barrier. By inferring the genome-wide landscape of variation among subspecies, we found 14 genomic regions of elevated differentiation, including a locus that may be related to each subspecies' distinctive coat pigmentation pattern. We investigated these regions as candidate speciation islands. However, we observed no significant reduction in gene flow in these regions, nor any indications of selection against hybrids. Altogether, these results suggest a pattern whereby climatically driven vicariance is the most important process driving the African antelope radiation and suggest that reproductive isolation may not set in until very late in the divergence process. This has a significant impact on taxonomic inference, as many taxa will be in a gray area of ambiguous systematic status, possibly explaining why it has been hard to achieve consensus regarding the species status of many African antelopes. Our analyses demonstrate how population genetics based on low-depth whole genome sequencing can provide new insights that can help resolve how far lineages have gone along the path to speciation.
Phylogenomics has the power to uncover complex phylogenetic scenarios across the genome. In most cases, no single topology is reflected across the entire genome as the phylogenetic signal differs among genomic regions due to processes, such as introgression and incomplete lineage sorting. Baleen whales are among the largest vertebrates on Earth with a high dispersal potential in a relatively unrestricted habitat, the oceans. The fin whale (Balaenoptera physalus) is one of the most enigmatic baleen whale species, currently divided into four subspecies. It has been a matter of debate whether phylogeographic patterns explain taxonomic variation in fin whales. Here we present a chromosome-level whole genome analysis of the phylogenetic relationships among fin whales from multiple ocean basins. First, we estimated concatenated and consensus phylogenies for both the mitochondrial and nuclear genomes. The consensus phylogenies based upon the autosomal genome uncovered monophyletic clades associated with each ocean basin, aligning with the current understanding of subspecies division. Nevertheless, discordances were detected in the phylogenies based on the Y chromosome, mitochondrial genome, autosomal genome and X chromosome. Furthermore, we detected signs of introgression and pervasive phylogenetic discordance across the autosomal genome. This complex phylogenetic scenario could be explained by a puzzle of introgressive events, not yet documented in fin whales. Similarly, incomplete lineage sorting and low phylogenetic signal could lead to such phylogenetic discordances. Our study reinforces the pitfalls of relying on concatenated or single locus phylogenies to determine taxonomic relationships below the species level by illustrating the underlying nuances that some phylogenetic approaches may fail to capture. We emphasize the significance of accurate taxonomic delineation in fin whales by exploring crucial information revealed through genome-wide assessments.
Relationships among species in the tree of life can complicate comparative methods and testing adaptive hypotheses. Models based on the Ornstein-Uhlenbeck process permit hypotheses about adaptation to be tested by allowing traits to either evolve toward fixed adaptive optima (e.g., regimes or niches) or track continuously changing optima that can be influenced by other traits. These models allow estimation of the effects of both adaptation and phylogenetic inertia-resistance to adaptation due to any source-on trait evolution, an approach known as the "adaptation-inertia" framework. However, previous applications of this framework, and most approaches suggested to deal with the issue of species non-independence, are based on a maximum likelihood approach, and thus it is difficult to include information based on prior biological knowledge in the analysis, which can affect resulting inferences. Here, I present Blouch, (Bayesian Linear Ornstein-Uhlenbeck Models for Comparative Hypotheses), which fits allometric and adaptive models of continuous trait evolution in a Bayesian framework based on fixed or continuous predictors and incorporates measurement error. I first briefly discuss the models implemented in Blouch, and then the new applications for these models provided by a Bayesian framework. This includes the advantages of assigning biologically meaningful priors when compared to non-Bayesian approaches, allowing for varying effects (intercepts and slopes), and multilevel modeling. Validations on simulated data show good performance in recovering the true evolutionary parameters for all models. To demonstrate the workflow of Blouch on an empirical dataset, I test the hypothesis that the relatively larger antlers of larger-bodied deer are the result of more intense sexual selection that comes along with their tendency to live in larger breeding groups. While results show that larger-bodied deer that live in larger breeding groups have relatively larger antlers, deer living in the smallest groups appear to have a different and steeper scaling pattern of antler size to body size than other groups. These results are contrary to previous findings and may argue that a different type of sexual selection or other selective pressures govern optimum antler size in the smallest breeding groups.
Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of 2 deep learning methods-supervised classification approaches and unsupervised similarity learning-to infer organism relationships from specimen images. As a basis, we assembled an image data set covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this data set for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our data set. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.
-Gene flow between diverging lineages challenges the resolution of species boundaries and the understanding of evolutionary history in recent radiations. Here, we integrate phylogenetic and coalescent tools to resolve reticulate patterns of diversification and use a perspective focused on evolutionary mechanisms to distinguish interspecific and intraspecific taxonomic variation. We use this approach to resolve the systematics for one of the most intensively studied but difficult to understand groups of reptiles: the spotted whiptail lizards of the genus Aspidoscelis (A. gularis complex). Whiptails contain the largest number of unisexual species known within any vertebrate group and the spotted whiptail complex has played a key role in the generation of this diversity through hybrid speciation. Understanding lineage boundaries and the evolutionary history of divergence and reticulation within this group is therefore key to understanding the generation of unisexual diversity in whiptails. Despite this importance, long-standing confusion about their systematics has impeded understanding of which gonochoristic species have contributed to the formation of unisexual lineages. Using reduced representation genomic data, we resolve patterns of divergence and gene flow within the spotted whiptails and clarify patterns of hybrid speciation. We find evidence that biogeographically structured ecological and environmental variation has been important in morphological and genetic diversification, as well as the maintenance of species boundaries in this system. Our study elucidates how gene flow among lineages and the continuous nature of speciation can bias the practice of species delimitation and lead taxonomists operating under different frameworks to different conclusions (here we propose that a 2 species arrangement best reflects our current understanding). In doing so, this study provides conceptual and methodological insights into approaches to resolving diversification patterns and species boundaries in rapid radiations with complex histories, as well as long-standing taxonomic challenges in the field of systematic biology.
We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, testing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, thanks to its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This article describes the features of PhyloJunction-which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models-and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.
To model distribution ranges, the most popular methods of phylogenetic biogeography divide Earth into a handful of predefined areas. Other methods use explicit geographic ranges, but unfortunately, these methods assume a static Earth, ignoring the effects of plate tectonics and the changes in the landscape. To address this limitation, I propose a method that uses explicit geographic ranges and incorporates a plate motion model and a paleolandscape model directly derived from the models used by geologists in their tectonic and paleogeographic reconstructions. The underlying geographic model is a high-resolution pixelation of a spherical Earth. Biogeographic inference is based on diffusion, approximates the effects of the landscape, uses a time-stratified model to take into account the geographic changes, and directly integrates over all probable histories. By using a simplified stochastic mapping algorithm, it is possible to infer the ancestral locations as well as the distance traveled by the ancestral lineages. For illustration, I applied the method to an empirical phylogeny of the Sapindaceae plants. This example shows that methods based on explicit geographic data, coupled with high-resolution paleogeographic models, can provide detailed reconstructions of the ancestral areas but also include inferences about the probable dispersal paths and diffusion speed across the taxon history. The method is implemented in the program PhyGeo.
-Chloroplast capture, a phenomenon that can occur through interspecific hybridization and introgression, is frequently invoked to explain cytonuclear discordance in plants. However, relatively few studies have documented the mechanisms of cytonuclear coevolution and its potential for driving species differentiation and possible functional differences in the context of chloroplast capture. To address this crucial question, we chose the Aquilegia genus, which is known for having minimal sterility among species, and inferred that A. amurensis captured the plastome of A. parviflora based on cytonuclear discordance and gene flow between the 2 species. We focused on the introgression region and its differentiation from corresponding regions in closely related species, especially its composition in a chloroplast capture scenario. We found that nuclear genes encoding cytonuclear enzyme complexes (CECs; i.e., organelle-targeted genes) of chloroplast donor species were selectively retained and displaced the original CEC genes in chloroplast-receiving species due to cytonuclear interactions during introgression. Notably, the intrinsic correlation of CEC introgression was a greater degree of evolutionary distance for these CECs between A. amurensis and A. parviflora. Terpene synthase activity genes (GO: 0010333) were overrepresented among the introgressed genes, and more than 30% of these genes were CEC genes. These findings support our observations that floral terpene release pattern is similar between A. amurensis and A. parviflora compared with A. japonica. Our study clarifies the mechanisms of cytonuclear coevolution, species differentiation, and functional differences in the context of chloroplast capture and highlights the potential role of chloroplast capture in adaptation.