The interplay of key innovation and ecological opportunity is commonly recognized to be the catalyst for rapid radiation. Underground storage organs (USOs), as a vital ecological trait, are advantageous for adaptation of plants to extreme environments, but receive less attention compared to aboveground organs. Repeated evolution of various USOs has occurred across the plant tree of life. However, whether repeated occurrences of a USO in different clades of a group can promote its replicated radiations in combination with the invasion of similar environments remains poorly known. Corydalis is a megadiverse genus in Papaveraceae and exhibits remarkable variations in USO morphology and biome occupancy. Here, we first generated a robust phylogeny for Corydalis with wide taxonomic and genomic coverage based on plastome and nuclear ribosomal DNA sequence data. By dating the branching events, reconstructing ancestral ranges, evaluating diversification dynamics, and inferring evolutionary patterns of USOs and biomes and their correlations, we then tested whether the interplay of USO evolution and biome shifts has driven rapid diversification of some Corydalis lineages. Our results indicate that Corydalis began to diversify in the Qinghai-Tibet Plateau (QTP) at ca. 41 Ma, and 88% of dispersals happened through forests, suggesting that forests served as important dispersal corridors for range expansion of the genus. The storage root has originated independently at least six times in Corydalis since the Miocene, and its acquisition could have operated as a key innovation towards the adaptation to the alpine biome in the QTP. Repeated evolution of this game-changing trait and invasions of alpine biome, in combination with geoclimatic changes, could have jointly driven independent radiations of the two clades of Corydalis in the QTP at ca. 6 Ma. Our study provides new insights into the joint contribution of USO repeated evolution and biome shifts to replicated radiations, hence increasing our ability to predict evolutionary trajectories in plants facing similar environmental pressures. [Biome shift; diversification rates; Papaveraceae; phylogenomics; Qinghai-Tibet Plateau; underground storage organs.].
Time-dependent birth-death sampling models have been used in numerous studies to infer past evolutionary dynamics in different biological contexts, for example, speciation and extinction rates in macroevolutionary studies, or effective reproductive number in epidemiological studies. These models are branching processes where lineages can bifurcate, die, or be sampled with time-dependent birth, death, and sampling rates, generating phylogenetic trees. It has been shown that in some subclasses of such models, different sets of rates can result in the same distributions of reconstructed phylogenetic trees, and therefore, the rates become unidentifiable from the trees regardless of their size. Here, we show that widely used time-dependent fossilized birth-death (FBD) models are identifiable. This subclass of models makes more realistic assumptions about the fossilization process and certain infectious disease transmission processes than the unidentifiable birth-death sampling models. Namely, FBD models assume that sampled lineages stay in the process rather than being immediately removed upon sampling. The identifiability of the time-dependent FBD model justifies using statistical methods that implement this model to infer the underlying temporal diversification or epidemiological dynamics from phylogenetic trees or directly from molecular or other comparative data. We further show that the time-dependent FBD model with an extra parameter, the removal after sampling probability, is unidentifiable. This implies that in scenarios where we do not know how sampling affects lineages, we are unable to infer this extra parameter together with birth, death, and sampling rates solely from trees.
Variation in gene tree estimates is widely observed in empirical phylogenomic data and is often assumed to be the result of biological processes. However, a recent study using tetrapod mitochondrial genomes to control for biological sources of variation due to their haploid, uniparentally inherited, and non-recombining nature found that levels of discordance among mitochondrial gene trees were comparable to those found in studies that assume only biological sources of variation. Additionally, they found that several of the models of sequence evolution chosen to infer gene trees were doing an inadequate job of fitting the sequence data. These results indicated that significant amounts of gene tree discordance in empirical data may be due to poor fit of sequence evolution models and that more complex and biologically realistic models may be needed. To test how the fit of sequence evolution models relates to gene tree discordance, we analyzed the same mitochondrial data sets as the previous study using 2 additional, more complex models of sequence evolution that each include a different biologically realistic aspect of the evolutionary process: A covarion model to incorporate site-specific rate variation across lineages (heterotachy), and a partitioned model to incorporate variable evolutionary patterns by codon position. Our results show that both additional models fit the data better than the models used in the previous study, with the covarion being consistently and strongly preferred as tree size increases. However, even these more preferred models still inferred highly discordant mitochondrial gene trees, thus deepening the mystery around what we label the "Mito-Phylo Paradox" and leading us to ask whether the observed variation could, in fact, be biological in nature after all.
Despite significant advances in phylogenetics over the past decades, the deep relationships within Bivalvia (phylum Mollusca) remain inconclusive. Previous efforts based on morphology or several genes have failed to resolve many key nodes in the phylogeny of Bivalvia. Advances have been made recently using transcriptome data, but the phylogenetic relationships within Bivalvia historically lacked consensus, especially within Pteriomorphia and Imparidentia. Here, we inferred the relationships of key lineages within Bivalvia using matrices generated from specifically designed ultraconserved elements (UCEs) with 16 available genomic resources and 85 newly sequenced specimens from 55 families. Our new probes (Bivalve UCE 2k v.1) for target sequencing captured an average of 849 UCEs with 1085 bp in mean length from in vitro experiments. Our results introduced novel schemes from 6 major clades (Protobranchina, Pteriomorphia, Palaeoheterodonta, Archiheterodonta, Anomalodesmata, and Imparidentia), though some inner nodes were poorly resolved, such as paraphyletic Heterodonta in some topologies potentially due to insufficient taxon sampling. The resolution increased when analyzing specific matrices for Pteriomorphia and Imparidentia. We recovered 3 Pteriomorphia topologies different from previously published trees, with the strongest support for ((Ostreida + (Arcida + Mytilida)) + (Pectinida + (Limida + Pectinida))). Limida were nested within Pectinida, warranting further studies. For Imparidentia, our results strongly supported the new hypothesis of (Galeommatida + (Adapedonta + Cardiida)), while the possible non-monophyly of Lucinida was inferred but poorly supported. Overall, our results provide important insights into the phylogeny of Bivalvia and show that target enrichment sequencing of UCEs can be broadly applied to study both deep and shallow phylogenetic relationships.
Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate, and introduce new good practices for assessing the correctness of a model implementation with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.