The relative importance of genetic drift and local adaptation in facilitating speciation remains unclear. This is particularly true for seabirds, which can disperse over large geographic distances, providing opportunities for intermittent gene flow among distant colonies that span the temperature and salinity gradients of the oceans. Here, we delve into the genomic basis of adaptation and speciation of banded penguins, Galápagos (Spheniscus mendiculus), Humboldt (Spheniscus humboldti), Magellanic (Spheniscus magellanicus), and African penguins (Spheniscus demersus), by analyzing 114 genomes from the main 16 breeding colonies. We aim to identify the molecular mechanism and genomic adaptive traits that have facilitated their diversifications. Through positive selection and gene family expansion analyses, we identified candidate genes that may be related to reproductive isolation processes mediated by ecological thermal niche divergence. We recover signals of positive selection on key loci associated with spermatogenesis, especially during the recent peripatric divergence of the Galápagos penguin from the Humboldt penguin. High temperatures in tropical habitats may have favored selection on loci associated with spermatogenesis to maintain sperm viability, leading to reproductive isolation among young species. Our results suggest that genome-wide selection on loci associated with molecular pathways that underpin thermoregulation, osmoregulation, hypoxia, and social behavior appears to have been crucial in local adaptation of banded penguins. Overall, these results contribute to our understanding of how the complexity of biotic, but especially abiotic, factors, along with the high dispersal capabilities of these marine species, may promote both neutral and adaptive lineage divergence even in the presence of gene flow.
Determining the origins of novel genes and the mechanisms driving the emergence of new functions is challenging yet crucial for understanding evolutionary innovations. Recently evolved fish antifreeze proteins (AFPs) offer a unique opportunity to explore these processes, particularly the near-identical type I AFP (AFPI) found in four phylogenetically divergent fish taxa. This study tested the hypothesis of protein sequence convergence beyond functional convergence in three unrelated AFPI-bearing fish lineages. Through comprehensive comparative analyses of newly sequenced genomes of winter flounder and grubby sculpin, along with available high-quality genomes of cunner and 14 other related species, the study revealed that near-identical AFPI proteins originated from distinct genetic precursors in each lineage. Each lineage independently evolved a de novo coding region for the novel ice-binding protein while repurposing fragments from their respective ancestors into potential regulatory regions, representing partial de novo origination-a process that bridges de novo gene formation and the neofunctionalization of duplicated genes. The study supports existing models of new gene origination and introduces new ones: the innovation-amplification-divergence model, where novel changes precede gene duplication; the newly proposed duplication-degeneration-divergence model, which describes new functions arising from degenerated pseudogenes; and the duplication-degeneration-divergence gene fission model, where each new sibling gene differentially degenerates and renovates distinct functional domains from their parental gene. These findings highlight the diverse evolutionary pathways through which a novel functional gene with convergent sequences at the protein level can evolve across divergent species, advancing our understanding of the mechanistic intricacies in new gene formation.
Convergence offers an opportunity to explore to what extent evolution can be predictable when genomic composition and environmental triggers are similar. Here, we present an emergent model system to study convergent evolution in nature in a mammalian group, the bat genus Myotis. Three foraging strategies-gleaning, trawling, and aerial hawking, each characterized by different sets of phenotypic features-have evolved independently multiple times in different biogeographic regions in isolation for millions of years. To investigate the genomic basis of convergence and explore the functional genomic changes linked to ecomorphological convergence, we sequenced and annotated 17 new genomes and screened 16,426 genes for positive selection and associations between relative evolutionary rates and foraging strategies across 30 bat species representing all Myotis ecomorphs across geographic regions as well as among sister groups. We identify genomic changes that describe both phylogenetic and ecomorphological trends. We infer that colonization of new environments may have first required changes in genes linked to hearing sensory perception, followed by changes linked to fecundity and development, metabolism of carbohydrates, and heme degradation. These changes may be linked to prey acquisition and digestion and match phylogenetic trends. Our findings also suggest that the repeated evolution of ecomorphs does not always involve changes in the same genes but rather in genes with the same molecular functions such as developmental and cellular processes.
During the meiosis of many eukaryote species, crossovers tend to occur within narrow regions called recombination hotspots. In plants, it is generally thought that gene regulatory sequences, especially promoters and 5' to 3' untranslated regions, are enriched in hotspots, but this has been characterized in a handful of species only. We also lack a clear description of fine-scale variation in recombination rates within genic regions and little is known about hotspot position and intensity in plants. To address this question, we constructed fine-scale recombination maps from genetic polymorphism data and inferred recombination hotspots in 11 plant species. We detected gradients of recombination in genic regions in most species, yet gradients varied in intensity and shape depending on specific hotspot locations and gene structure. To further characterize recombination gradients, we decomposed them according to gene structure by rank and number of exons. We generalized the previously observed pattern that recombination hotspots are organized around the boundaries of coding sequences, especially 5' promoters. However, our results also provided new insight into the relative importance of the 3' end of genes in some species and the possible location of hotspots away from genic regions in some species. Variation among species seemed driven more by hotspot location among and within genes than by differences in size or intensity among species. Our results shed light on the variation in recombination rates at a very fine scale, revealing the diversity and complexity of genic recombination gradients emerging from the interaction between hotspot location and gene structure.
Profile mixture models capture distinct biochemical constraints on the amino acid substitution process at different sites in proteins. These models feature a mixture of time-reversible models with a common matrix of exchangeabilities and distinct sets of equilibrium amino acid frequencies known as profiles. Combining the exchangeability matrix with each profile generates the matrix of instantaneous rates of amino acid exchange for that profile. Currently, empirically estimated exchangeability matrices (e.g. the LG matrix) are widely used for phylogenetic inference under profile mixture models. However, these were estimated using a single profile and are unlikely optimal for profile mixture models. Here, we describe the GTRpmix model that allows maximum likelihood estimation of a common exchangeability matrix under any profile mixture model. We show that exchangeability matrices estimated under profile mixture models differ from the LG matrix, dramatically improving model fit and topological estimation accuracy for empirical test cases. Because the GTRpmix model is computationally expensive, we provide two exchangeability matrices estimated from large concatenated phylogenomic-supermatrices to be used for phylogenetic analyses. One, called Eukaryotic Linked Mixture (ELM), is designed for phylogenetic analysis of proteins encoded by nuclear genomes of eukaryotes, and the other, Eukaryotic and Archaeal Linked mixture (EAL), for reconstructing relationships between eukaryotes and Archaea. These matrices, combined with profile mixture models, fit data better and have improved topology estimation relative to the LG matrix combined with the same mixture models. Starting with version 2.3.1, IQ-TREE2 allows users to estimate linked exchangeabilities (i.e. amino acid exchange rates) under profile mixture models.
Prolines cause ribosomes to stall during translation due to their rigid structure. This phenomenon occurs in all domains of life and is exacerbated at polyproline motifs. Such stalling can be eased by the elongation factor P (EF-P) in bacteria. We discovered a potential connection between the loss of ancestral EF-P, the appearance of horizontally transferred EF-P variants, and genomic signs of EF-P dysfunction. Horizontal transfer of the efp gene has occurred several times among bacteria and is associated with the loss of highly conserved polyproline motifs. In this study, we pinpoint cases of horizontal EF-P transfer among a diverse set of bacteria and examine genomic features associated with these events in the phyla Thermotogota and Planctomycetes. In these phyla, horizontal EF-P transfer is also associated with the loss of entire polyproline motif-containing proteins, whose expression is likely dependent on EF-P. In particular, three proteases (Lon, ClpC, and FtsH) and three tRNA synthetases (ValS, IleS1, and IleS2) appear highly sensitive to EF-P transfer. The conserved polyproline motifs within these proteins all reside within close proximity to ATP-binding-regions, some of which are crucial for their function. Our work shows that an ancient EF-P dysfunction has left genomic traces that persist to this day, although it remains unclear whether this dysfunction was strictly due to loss of ancestral EF-P or was related to the appearance of an exogenous variant. The latter possibility would imply that the process of "domesticating" a horizontally transferred efp gene can perturb the overall function of EF-P.