The relative importance of genetic drift and local adaptation in facilitating speciation remains unclear. This is particularly true for seabirds, which can disperse over large geographic distances, providing opportunities for intermittent gene flow among distant colonies that span the temperature and salinity gradients of the oceans. Here, we delve into the genomic basis of adaptation and speciation of banded penguins, Galápagos (Spheniscus mendiculus), Humboldt (Spheniscus humboldti), Magellanic (Spheniscus magellanicus), and African penguins (Spheniscus demersus), by analyzing 114 genomes from the main 16 breeding colonies. We aim to identify the molecular mechanism and genomic adaptive traits that have facilitated their diversifications. Through positive selection and gene family expansion analyses, we identified candidate genes that may be related to reproductive isolation processes mediated by ecological thermal niche divergence. We recover signals of positive selection on key loci associated with spermatogenesis, especially during the recent peripatric divergence of the Galápagos penguin from the Humboldt penguin. High temperatures in tropical habitats may have favored selection on loci associated with spermatogenesis to maintain sperm viability, leading to reproductive isolation among young species. Our results suggest that genome-wide selection on loci associated with molecular pathways that underpin thermoregulation, osmoregulation, hypoxia, and social behavior appears to have been crucial in local adaptation of banded penguins. Overall, these results contribute to our understanding of how the complexity of biotic, but especially abiotic, factors, along with the high dispersal capabilities of these marine species, may promote both neutral and adaptive lineage divergence even in the presence of gene flow.
Determining the origins of novel genes and the mechanisms driving the emergence of new functions is challenging yet crucial for understanding evolutionary innovations. Recently evolved fish antifreeze proteins (AFPs) offer a unique opportunity to explore these processes, particularly the near-identical type I AFP (AFPI) found in four phylogenetically divergent fish taxa. This study tested the hypothesis of protein sequence convergence beyond functional convergence in three unrelated AFPI-bearing fish lineages. Through comprehensive comparative analyses of newly sequenced genomes of winter flounder and grubby sculpin, along with available high-quality genomes of cunner and 14 other related species, the study revealed that near-identical AFPI proteins originated from distinct genetic precursors in each lineage. Each lineage independently evolved a de novo coding region for the novel ice-binding protein while repurposing fragments from their respective ancestors into potential regulatory regions, representing partial de novo origination-a process that bridges de novo gene formation and the neofunctionalization of duplicated genes. The study supports existing models of new gene origination and introduces new ones: the innovation-amplification-divergence model, where novel changes precede gene duplication; the newly proposed duplication-degeneration-divergence model, which describes new functions arising from degenerated pseudogenes; and the duplication-degeneration-divergence gene fission model, where each new sibling gene differentially degenerates and renovates distinct functional domains from their parental gene. These findings highlight the diverse evolutionary pathways through which a novel functional gene with convergent sequences at the protein level can evolve across divergent species, advancing our understanding of the mechanistic intricacies in new gene formation.
During the meiosis of many eukaryote species, crossovers tend to occur within narrow regions called recombination hotspots. In plants, it is generally thought that gene regulatory sequences, especially promoters and 5' to 3' untranslated regions, are enriched in hotspots, but this has been characterized in a handful of species only. We also lack a clear description of fine-scale variation in recombination rates within genic regions and little is known about hotspot position and intensity in plants. To address this question, we constructed fine-scale recombination maps from genetic polymorphism data and inferred recombination hotspots in 11 plant species. We detected gradients of recombination in genic regions in most species, yet gradients varied in intensity and shape depending on specific hotspot locations and gene structure. To further characterize recombination gradients, we decomposed them according to gene structure by rank and number of exons. We generalized the previously observed pattern that recombination hotspots are organized around the boundaries of coding sequences, especially 5' promoters. However, our results also provided new insight into the relative importance of the 3' end of genes in some species and the possible location of hotspots away from genic regions in some species. Variation among species seemed driven more by hotspot location among and within genes than by differences in size or intensity among species. Our results shed light on the variation in recombination rates at a very fine scale, revealing the diversity and complexity of genic recombination gradients emerging from the interaction between hotspot location and gene structure.
Convergence offers an opportunity to explore to what extent evolution can be predictable when genomic composition and environmental triggers are similar. Here, we present an emergent model system to study convergent evolution in nature in a mammalian group, the bat genus Myotis. Three foraging strategies-gleaning, trawling, and aerial hawking, each characterized by different sets of phenotypic features-have evolved independently multiple times in different biogeographic regions in isolation for millions of years. To investigate the genomic basis of convergence and explore the functional genomic changes linked to ecomorphological convergence, we sequenced and annotated 17 new genomes and screened 16,426 genes for positive selection and associations between relative evolutionary rates and foraging strategies across 30 bat species representing all Myotis ecomorphs across geographic regions as well as among sister groups. We identify genomic changes that describe both phylogenetic and ecomorphological trends. We infer that colonization of new environments may have first required changes in genes linked to hearing sensory perception, followed by changes linked to fecundity and development, metabolism of carbohydrates, and heme degradation. These changes may be linked to prey acquisition and digestion and match phylogenetic trends. Our findings also suggest that the repeated evolution of ecomorphs does not always involve changes in the same genes but rather in genes with the same molecular functions such as developmental and cellular processes.
Profile mixture models capture distinct biochemical constraints on the amino acid substitution process at different sites in proteins. These models feature a mixture of time-reversible models with a common matrix of exchangeabilities and distinct sets of equilibrium amino acid frequencies known as profiles. Combining the exchangeability matrix with each profile generates the matrix of instantaneous rates of amino acid exchange for that profile. Currently, empirically estimated exchangeability matrices (e.g. the LG matrix) are widely used for phylogenetic inference under profile mixture models. However, these were estimated using a single profile and are unlikely optimal for profile mixture models. Here, we describe the GTRpmix model that allows maximum likelihood estimation of a common exchangeability matrix under any profile mixture model. We show that exchangeability matrices estimated under profile mixture models differ from the LG matrix, dramatically improving model fit and topological estimation accuracy for empirical test cases. Because the GTRpmix model is computationally expensive, we provide two exchangeability matrices estimated from large concatenated phylogenomic-supermatrices to be used for phylogenetic analyses. One, called Eukaryotic Linked Mixture (ELM), is designed for phylogenetic analysis of proteins encoded by nuclear genomes of eukaryotes, and the other, Eukaryotic and Archaeal Linked mixture (EAL), for reconstructing relationships between eukaryotes and Archaea. These matrices, combined with profile mixture models, fit data better and have improved topology estimation relative to the LG matrix combined with the same mixture models. Starting with version 2.3.1, IQ-TREE2 allows users to estimate linked exchangeabilities (i.e. amino acid exchange rates) under profile mixture models.
Prolines cause ribosomes to stall during translation due to their rigid structure. This phenomenon occurs in all domains of life and is exacerbated at polyproline motifs. Such stalling can be eased by the elongation factor P (EF-P) in bacteria. We discovered a potential connection between the loss of ancestral EF-P, the appearance of horizontally transferred EF-P variants, and genomic signs of EF-P dysfunction. Horizontal transfer of the efp gene has occurred several times among bacteria and is associated with the loss of highly conserved polyproline motifs. In this study, we pinpoint cases of horizontal EF-P transfer among a diverse set of bacteria and examine genomic features associated with these events in the phyla Thermotogota and Planctomycetes. In these phyla, horizontal EF-P transfer is also associated with the loss of entire polyproline motif-containing proteins, whose expression is likely dependent on EF-P. In particular, three proteases (Lon, ClpC, and FtsH) and three tRNA synthetases (ValS, IleS1, and IleS2) appear highly sensitive to EF-P transfer. The conserved polyproline motifs within these proteins all reside within close proximity to ATP-binding-regions, some of which are crucial for their function. Our work shows that an ancient EF-P dysfunction has left genomic traces that persist to this day, although it remains unclear whether this dysfunction was strictly due to loss of ancestral EF-P or was related to the appearance of an exogenous variant. The latter possibility would imply that the process of "domesticating" a horizontally transferred efp gene can perturb the overall function of EF-P.
Heterochromatin is a gene-poor and repeat-rich genomic compartment universally found in eukaryotes. Despite its low transcriptional activity, heterochromatin plays important roles in maintaining genome stability, organizing chromosomes, and suppressing transposable elements. Given the importance of these functions, it is expected that genes involved in heterochromatin regulation would be highly conserved. Yet, a handful of these genes were found to evolve rapidly. To investigate whether these previous findings are anecdotal or general to genes modulating heterochromatin, we compile an exhaustive list of 106 candidate genes involved in heterochromatin functions and investigate their evolution over short and long evolutionary time scales in Drosophila. Our analyses find that these genes exhibit significantly more frequent evolutionary changes, both in the forms of amino acid substitutions and gene copy number change, when compared to genes involved in Polycomb-based repressive chromatin. While positive selection drives amino acid changes within both structured domains with diverse functions and intrinsically disordered regions, purifying selection may have maintained the proportions of intrinsically disordered regions of these proteins. Together with the observed negative associations between the evolutionary rate of these genes and the genomic abundance of transposable elements, we propose an evolutionary model where the fast evolution of genes involved in heterochromatin functions is an inevitable outcome of the unique functional roles of heterochromatin, while the rapid evolution of transposable elements may be an effect rather than cause. Our study provides an important global view of the evolution of genes involved in this critical cellular domain and provides insights into the factors driving the distinctive evolution of heterochromatin.