Rates of nucleotide substitution vary substantially across the Tree of Life, with potentially confounding effects on phylogenetic and evolutionary analyses. A large acceleration in mitochondrial substitution rate occurs in the cockroach family Nocticolidae, which predominantly inhabit subterranean environments. To evaluate the impacts of this among-lineage rate heterogeneity on estimates of phylogenetic relationships and evolutionary timescales, we analyzed nuclear ultraconserved elements (UCEs) and mitochondrial genomes from nocticolids and other cockroaches. Substitution rates were substantially elevated in nocticolid lineages compared with other cockroaches, especially in mitochondrial protein-coding genes. This disparity in evolutionary rates is likely to have led to different evolutionary relationships being supported by phylogenetic analyses of mitochondrial genomes and UCE loci. Furthermore, Bayesian dating analyses using relaxed-clock models inferred much deeper divergence times compared with a flexible local clock. Our phylogenetic analysis of UCEs, which is the first genome-scale study to include all 13 major cockroach families, unites Corydiidae and Nocticolidae and places Anaplectidae as the sister lineage to the rest of Blattoidea. We uncover an extraordinary level of genetic divergence in Nocticolidae, including two highly distinct clades that separated ~115 million years ago despite both containing representatives of the genus Nocticola. The results of our study highlight the potential impacts of high among-lineage rate variation on estimates of phylogenetic relationships and evolutionary timescales.
The molluskan order Neogastropoda encompasses over 15,000 almost exclusively marine species playing important roles in benthic communities and in the economies of coastal countries. Neogastropoda underwent intensive cladogenesis in the early stages of diversification, generating a "bush" at the base of their evolutionary tree, which has been hard to resolve even with high throughput molecular data. In the present study to resolve the bush, we use a variety of phylogenetic inference methods and a comprehensive exon capture dataset of 1817 loci (79.6% data occupancy) comprising 112 taxa of 48 out of 60 Neogastropoda families. Our results show consistent topologies and high support in all analyses at (super)family level, supporting monophyly of Muricoidea, Mitroidea, Conoidea, and, with some reservations, Olivoidea and Buccinoidea. Volutoidea and Turbinelloidea as currently circumscribed are clearly paraphyletic. Despite our analyses consistently resolving most backbone nodes, 3 prove problematic: First, the uncertain placement of Cancellariidae, as the sister group to either a Ficoidea-Tonnoidea clade or to the rest of Neogastropoda, leaves monophyly of Neogastropoda unresolved. Second, relationships are contradictory at the base of the major "core Neogastropoda" grouping. Third, coalescence-based analyses reject monophyly of the Buccinoidea in relation to Vasidae. We analyzed phylogenetic signal of targeted loci in relation to potential biases, and we propose the most probable resolutions in the latter 2 recalcitrant nodes. The uncertain placement of Cancellariidae may be explained by orthology violations due to differential paralog loss shortly after the whole genome duplication, which should be resolved with a curated set of longer loci.
Asymmetrical rates of cladogenesis and extinction abound in the tree of life, resulting in numerous minute clades that are dwarfed by larger sister groups. Such taxa are commonly regarded as phylogenetic relicts or "living fossils" when they exhibit an ancient first appearance in the fossil record and prolonged external morphological stasis, particularly in comparison to their more diversified sister groups. Due to their special status, various phylogenetic relicts tend to be well-studied and prioritized for conservation. A notable exception to this trend is found within Amblypygi ("whip spiders"), a visually striking order of functionally hexapodous arachnids that are notable for their antenniform first walking leg pair (the eponymous "whips"). Paleoamblypygi, the putative sister group to the remaining Amblypygi, is known from Late Carboniferous and Eocene deposits but is survived by a single living species, Paracharon caecusHansen (1921), that was last collected in 1899. Due to the absence of genomic sequence-grade tissue for this vital taxon, there is no global molecular phylogeny for Amblypygi to date, nor a fossil-calibrated estimation of divergences within the group. Here, we report a previously unknown species of Paleoamblypygi from a cave site in Colombia. Capitalizing upon this discovery, we generated the first molecular phylogeny of Amblypygi, integrating ultraconserved element sequencing with legacy Sanger datasets and including described extant genera. To quantify the impact of sampling Paleoamblypygi on divergence time estimation, we performed in silico experiments with pruning of Paracharon. We demonstrate that the omission of relicts has a significant impact on the accuracy of node dating approaches that outweighs the impact of excluding ingroup fossils, which bears upon the ancestral range reconstruction for the group. Our results underscore the imperative for biodiversity discovery efforts in elucidating the phylogenetic relationships of "dark taxa," and especially phylogenetic relicts in tropical and subtropical habitats. The lack of reciprocal monophyly for Charontidae and Charinidae leads us to subsume them into one family, Charontidae, new synonymy.
Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.
When communities are assembled through processes such as filtering or limiting similarity acting on phylogenetically conserved traits, the evolutionary signature of those traits may be reflected in patterns of community membership. We show how the model of trait evolution underlying community-structuring traits can be inferred from community membership data using both a variation of a traditional eco-phylogenetic metric-the mean pairwise phylogenetic distance (MPD) between taxa-and a recent machine learning tool, Convolutional Kitchen Sinks (CKS). Both methods perform well across a range of phylogenetically informative evolutionary models, but CKS outperforms MPD as tree size increases. We demonstrate CKS by inferring the evolutionary history of freeze tolerance in angiosperms. Our analysis is consistent with a late burst model, suggesting freeze tolerance evolved recently. We suggest that multiple data types that are ordered on phylogenies, such as trait values, species interactions, or community presence/absence, are good candidates for CKS modeling because the generative models produce structured differences between neighboring points that CKS is well-suited for. We introduce the R package kitchen to perform CKS for generic application of the technique.
Molecular sequence data from rapidly evolving organisms are often sampled at different points in time. Sampling times can then be used for molecular clock calibration. The root-to-tip (RTT) regression is an essential tool to assess the degree to which the data behave in a clock-like fashion. Here, we introduce Clockor2, a client-side web application for conducting RTT regression. Clockor2 allows users to quickly fit local and global molecular clocks, thus handling the increasing complexity of genomic datasets that sample beyond the assumption of homogeneous host populations. Clockor2 is efficient, handling trees of up to the order of 104 tips, with significant speed increases compared with other RTT regression applications. Although clockor2 is written as a web application, all data processing happens on the client-side, meaning that data never leave the user's computer. Clockor2 is freely available at https://clockor2.github.io/.
Why and how organismal lineages radiate is commonly studied through either assessing abiotic factors (biogeography, geomorphological processes, and climate) or biotic factors (traits and interactions). Despite increasing awareness that both abiotic and biotic processes may have important joint effects on diversification dynamics, few attempts have been made to quantify the relative importance and timing of these factors, and their potentially interlinked direct and indirect effects, on lineage diversification. We here combine assessments of historical biogeography, geomorphology, climatic niche, vegetative, and floral trait evolution to test whether these factors jointly, or in isolation, explain diversification dynamics of a Neotropical plant clade (Merianieae, Melastomataceae). After estimating ancestral areas and the changes in niche and trait disparity over time, we employ Phylogenetic Path Analyses as a synthesis tool to test eleven hypotheses on the individual direct and indirect effects of these factors on diversification rates. We find strongest support for interlinked effects of colonization of the uplifting Andes during the mid-Miocene and rapid abiotic climatic niche evolution in explaining a burst in diversification rate in Merianieae. Within Andean habitats, later increases in floral disparity allowed for the exploitation of wider pollination niches (i.e., shifts from bee to vertebrate pollinators), but did not affect diversification rates. Our approach of including both vegetative and floral trait evolution, rare in assessments of plant diversification in general, highlights that the evolution of woody habit and larger flowers preceded the colonization of the Andes, but was likely critical in enabling the rapid radiation in montane environments. Overall, and in concert with the idea that ecological opportunity is a key element of evolutionary radiations, our results suggest that a combination of rapid niche evolution and trait shifts was critical for the exploitation of newly available niche space in the Andes in the mid-Miocene. Further, our results emphasize the importance of incorporating both abiotic and biotic factors into the same analytical framework if we aim to quantify the relative and interlinked effects of these processes on diversification.
Interspecific interactions, including host-symbiont associations, can profoundly affect the evolution of the interacting species. Given the phylogenies of host and symbiont clades and knowledge of which host species interact with which symbiont, two questions are often asked: "Do closely related hosts interact with closely related symbionts?" and "Do host and symbiont phylogenies mirror one another?." These questions are intertwined and can even collapse under specific situations, such that they are often confused one with the other. However, in most situations, a positive answer to the first question, hereafter referred to as "cophylogenetic signal," does not imply a close match between the host and symbiont phylogenies. It suggests only that past evolutionary history has contributed to shaping present-day interactions, which can arise, for example, through present-day trait matching, or from a single ancient vicariance event that increases the probability that closely related species overlap geographically. A positive answer to the second, referred to as "phylogenetic congruence," is more restrictive as it suggests a close match between the two phylogenies, which may happen, for example, if symbiont diversification tracks host diversification or if the diversifications of the two clades were subject to the same succession of vicariance events. Here we apply a set of methods (ParaFit, PACo, and eMPRess), whose significance is often interpreted as evidence for phylogenetic congruence, to simulations under 3 biologically realistic scenarios of trait matching, a single ancient vicariance event, and phylogenetic tracking with frequent cospeciation events. The latter is the only scenario that generates phylogenetic congruence, whereas the first 2 generate a cophylogenetic signal in the absence of phylogenetic congruence. We find that tests of global-fit methods (ParaFit and PACo) are significant under the 3 scenarios, whereas tests of event-based methods (eMPRess) are only significant under the scenario of phylogenetic tracking. Therefore, significant results from global-fit methods should be interpreted in terms of cophylogenetic signal and not phylogenetic congruence; such significant results can arise under scenarios when hosts and symbionts had independent evolutionary histories. Conversely, significant results from event-based methods suggest a strong form of dependency between hosts and symbionts evolutionary histories. Clarifying the patterns detected by different cophylogenetic methods is key to understanding how interspecific interactions shape and are shaped by evolution.
To model distribution ranges, the most popular methods of phylogenetic biogeography divide Earth into a handful of predefined areas. Other methods use explicit geographic ranges, but unfortunately, these methods assume a static Earth, ignoring the effects of plate tectonics and the changes in the landscape. To address this limitation, I propose a method that uses explicit geographic ranges and incorporates a plate motion model and a paleolandscape model directly derived from the models used by geologists in their tectonic and paleogeographic reconstructions. The underlying geographic model is a high-resolution pixelation of a spherical Earth. Biogeographic inference is based on diffusion, approximates the effects of the landscape, uses a time-stratified model to take into account the geographic changes, and directly integrates over all probable histories. By using a simplified stochastic mapping algorithm, it is possible to infer the ancestral locations as well as the distance traveled by the ancestral lineages. For illustration, I applied the method to an empirical phylogeny of the Sapindaceae plants. This example shows that methods based on explicit geographic data, coupled with high-resolution paleogeographic models, can provide detailed reconstructions of the ancestral areas but also include inferences about the probable dispersal paths and diffusion speed across the taxon history. The method is implemented in the program PhyGeo.