Nuclear genome sequencing for phylogenetics is resource-intensive while mitochondrial genomes can be sequenced and analyzed with relative ease for building densely sampled phylogenetic trees of the most species-rich lineages of animals. Here, we develop a conceptual approach and bioinformatics workflow for combining nuclear single-copy orthologs with less informative but densely sampled mitochondrial genomes, for a detailed tree of Coleoptera (beetles). Basal relationships of Coleoptera were first inferred from > 2,000 BUSCO loci mined from GenBank's Short Read Archive for 119 exemplars of all major lineages under various substitution models and levels of matrix completion, to reveal universally supported nodes. Second, the corresponding mitogenomes were extracted and combined with an additional 373 species selected for broad taxonomic and biogeographic coverage, roughly in proportion to the known global species diversity of Coleoptera. Bioinformatic processing of mitogenomes was conducted with a novel pipeline for rapid, accurate annotation of protein-coding genes. Finally, phylogenetic trees from all 491 mitogenomes were generated under a backbone constraint from the universal basal nodes, which produced a well-supported tree of the major lineages at the family and superfamily level. Being genetically unlinked and showing unique character variation, mitogenomes provide a unique perspective of the phylogeny. Comparison with 3 recent nuclear phylogenomic studies resulted in the recognition of > 80 nodes universally present across all analyses. These may now support the higher classification of Coleoptera and serve as backbone of further studies, as numerous full mitogenomes and mitochondrial DNA barcodes are added to an increasingly complete phylogenetic tree of this super-diverse insect order.
Obtaining a timescale for bacterial evolution is crucial to understand early life evolution but is difficult owing to the scarcity of bacterial fossils. Here, we introduce multiple new time constraints to calibrate bacterial evolution based on ancient symbiosis. This idea is implemented using a bacterial tree constructed with genes found in the mitochondrial lineages phylogenetically embedded within Proteobacteria. The expanded mitochondria-bacterial tree allows the node age constraints of eukaryotes established by their abundant fossils to be propagated to ancient co-evolving bacterial symbionts and across the bacterial tree of life. Importantly, we formulate a new probabilistic framework that considers uncertainty in inference of the ancestral lifestyle of modern symbionts to apply 19 relative time constraints each informed by host-symbiont association to constrain bacterial symbionts no older than their eukaryotic host. Moreover, we develop an approach to incorporating substitution mixture models that better accommodate substitutional saturation and compositional heterogeneity for dating deep phylogenies. Our analysis estimates that the last bacterial common ancestor occurred approximately 4.0-3.5 billion years ago (Ga), followed by rapid divergence of major bacterial clades. It is generally robust to alternative root ages, root positions, tree topologies, fossil ages, ancestral lifestyle reconstruction, gene sets, among other factors. The obtained timetree serves as a foundation for testing hypotheses regarding bacterial diversification and its correlation with geobiological events across different timescales.
The two most popular tree models used in phylogenetics are the birth-death process (BD) and the Kingman coalescent (KC). These two models differ in several respects, notably: (i) the curve of the population size through time is a stochastic process in the BD, versus a parametrized curve in the KC, (ii) the BD makes assumptions about the way samples are collected, while the KC conditions on the number of samples and the collection times, thus bypassing the need to describe the sampling procedure. These two models have been applied to different contexts: the BD in macroevolutionary studies of clades of species, and the KC for populations. The exception is the field of phylogenetic epidemiology which uses both models. This then asks the question of how such different models can be used in the same context. In this paper, we study large-population limits of the BD, in a search for a mathematical link between the BD and the KC. We show that the KC is the large-population limit of a BD conditioned on a given population trajectory, and we provide the formula for the parameter θ of the limiting KC. This formula appears in earlier studies, but the present article is the first to show formally how the correspondence arises as a large-population limit, and that the BD needs to be conditioned for the KC to arise. Besides these fundamentally mathematical results, we demonstrate how our findings can be used practically in phylogenetic inference. In particular, we propose a new method for phylogenetic epidemiology, called CalicoBird, ensuing from our results. We conjecture that this new method, used in conjunction with auxiliary data (e.g. prevalence or incidence data), should allow estimating important epidemiological parameters (e.g. the prevalence and the effective reproduction number), in a way that is robust to the data-generating model and the sampling procedure. Future studies will be needed to put our claims to the test.
Phylogenomic analyses of closely related species allow important glimpses into their evolutionary history. Although recent studies have demonstrated that inter-species hybridization has occurred in several groups, incorporating this process in phylogenetic reconstruction remains challenging. Specifically, the most predominant topology across the genome is often assumed to reflect the speciation tree, but rampant hybridization might overwhelm the genomes, causing that assumption to be violated. The notoriously challenging phylogeny of the 5 extant Panthera species (specifically jaguar [P. onca], lion [P. leo], and leopard [P. pardus]) is an interesting system to address this problem. Here we employed a Panthera-wide whole-genome-sequence data set incorporating 3 jaguar genomes and 2 representatives of lions and leopards to dissect the relationships among these 3 species. Maximum-likelihood trees reconstructed from non-overlapping genomic fragments of 4 different sizes strongly supported the monophyly of all 3 species. The most frequent topology (76-95%) united lion + leopard as a sister species (topology 1), followed by lion + jaguar (topology 2: 4-8%) and leopard + jaguar (topology 3: 0-6%). Topology 1 was dominant across the genome, especially in high-recombination regions. Topologies 2 and 3 were enriched in low-recombination segments, likely reflecting the species tree in the face of hybridization. Divergence times between sister species of each topology, corrected for local-recombination-rate effects, indicated that the lion-leopard divergence was significantly younger than the alternatives, likely driven by post-speciation admixture. Introgression analyses detected pervasive hybridization between lions and leopards, regardless of the assumed species tree. This inference was strongly supported by multispecies-coalescence-with-introgression analyses, which rejected topology 1 (lion+leopard) or any model without introgression. Interestingly, topologies 2 (lion+jaguar) and 3 (jaguar+leopard) with extensive lion-leopard introgression were unidentifiable, highlighting the complexity of this phylogenetic problem. Our results suggest that the dominant genome-wide tree topology is not the true species tree but rather a consequence of overwhelming post-speciation admixture between lion and leopard.
Secondary contact between previously allopatric lineages offers a test of reproductive isolating mechanisms that may have accrued in isolation. Such instances of contact can produce stable hybrid zones-where reproductive isolation can further develop via reinforcement or phenotypic displacement-or result in the lineages merging. Ongoing secondary contact is most visible in continental systems, where steady input from parental taxa can occur readily. In oceanic island systems, however, secondary contact between closely related species of birds is relatively rare. When observed on sufficiently small islands, relative to population size, secondary contact likely represents a recent phenomenon. Here, we examine the dynamics of a group of birds whose apparent widespread hybridization influenced Ernst Mayr's foundational work on allopatric speciation: the whistlers of Fiji (Aves: Pachycephala). We demonstrate 2 clear instances of secondary contact within the Fijian archipelago, one resulting in a hybrid zone on a larger island, and the other resulting in a wholly admixed population on a smaller island. We leveraged low genome-wide divergence in the hybrid zone to pinpoint a single genomic region associated with observed phenotypic differences. We use genomic data to present a new hypothesis that emphasizes rapid plumage evolution and post-divergence gene flow.

