Pub Date : 2025-02-05DOI: 10.1093/g3journal/jkae294
Anton Kraege, Edgar Chavarro-Carrero, Eva Schnell, Stefanie Heilmann-Heimbach, Kerstin Becker, Karl Köhrer, Bruno Huettel, Nafiseh Sargheini, Philipp Schiffer, Ann-Marie Waldvogel, Bart P H J Thomma, Hanna Rovenich
Unicellular green algae of the genus Coccomyxa are recognized for their worldwide distribution and ecological versatility. Coccomyxa elongata is a freshwater species of the Coccomyxa simplex clade, which also includes lichen symbionts. To facilitate future molecular and phylogenomic studies of this versatile clade of algae, we generated a high-quality genome assembly for C. elongata Chodat & Jaag SAG 216-3b within the framework of the Biodiversity Genomics Center Cologne (BioC2) initiative. A combination of long-read PacBio HiFi and Oxford Nanopore Technologies with chromatin conformation capture (Hi-C) sequencing led to the assembly of the genome into 21 scaffolds with a total length of 51.4 Mb and an N50 of 2.8 Mb. Nineteen of the scaffolds represent highly complete nuclear chromosomes delimited by telomeric repeats, while the two additional scaffolds represent the mitochondrial and plastid genomes. Transcriptome-guided gene annotation resulted in the identification of 14,811 protein-coding genes, of which 61% have annotated protein family domains and 841 are predicted to be secreted. Benchmarking universal single-copy orthologs analysis against the Chlorophyta database identified a total of 1,494 (98.4%) complete gene models, suggesting a highly complete genome annotation.
{"title":"High quality genome assembly and annotation (v1) of the eukaryotic freshwater microalga Coccomyxa elongata SAG 216-3b.","authors":"Anton Kraege, Edgar Chavarro-Carrero, Eva Schnell, Stefanie Heilmann-Heimbach, Kerstin Becker, Karl Köhrer, Bruno Huettel, Nafiseh Sargheini, Philipp Schiffer, Ann-Marie Waldvogel, Bart P H J Thomma, Hanna Rovenich","doi":"10.1093/g3journal/jkae294","DOIUrl":"10.1093/g3journal/jkae294","url":null,"abstract":"<p><p>Unicellular green algae of the genus Coccomyxa are recognized for their worldwide distribution and ecological versatility. Coccomyxa elongata is a freshwater species of the Coccomyxa simplex clade, which also includes lichen symbionts. To facilitate future molecular and phylogenomic studies of this versatile clade of algae, we generated a high-quality genome assembly for C. elongata Chodat & Jaag SAG 216-3b within the framework of the Biodiversity Genomics Center Cologne (BioC2) initiative. A combination of long-read PacBio HiFi and Oxford Nanopore Technologies with chromatin conformation capture (Hi-C) sequencing led to the assembly of the genome into 21 scaffolds with a total length of 51.4 Mb and an N50 of 2.8 Mb. Nineteen of the scaffolds represent highly complete nuclear chromosomes delimited by telomeric repeats, while the two additional scaffolds represent the mitochondrial and plastid genomes. Transcriptome-guided gene annotation resulted in the identification of 14,811 protein-coding genes, of which 61% have annotated protein family domains and 841 are predicted to be secreted. Benchmarking universal single-copy orthologs analysis against the Chlorophyta database identified a total of 1,494 (98.4%) complete gene models, suggesting a highly complete genome annotation.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11797067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142822166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1093/g3journal/jkae282
Evan M Long, Michelle C Stitzer, Brandon Monier, Aimee J Schulz, Maria Cinta Romay, Kelly R Robbins, Edward S Buckler
Centuries of clonal propagation in cassava (Manihot esculenta) have reduced sexual recombination, leading to the accumulation of deleterious mutations. This has resulted in both inbreeding depression affecting yield and a significant decrease in reproductive performance, creating hurdles for contemporary breeding programs. Cassava is a member of the Euphorbiaceae family, including notable species such as rubber tree (Hevea brasiliensis) and poinsettia (Euphorbia pulcherrima). Expanding upon preliminary draft genomes, we annotated 7 long-read genome assemblies and aligned a total of 52 genomes, to analyze selection across the genome and the phylogeny. Through this comparative genomic approach, we identified 48 genes under relaxed selection in cassava. Notably, we discovered an overrepresentation of floral expressed genes, especially focused at 6 pollen-related genes. Our results indicate that domestication and a transition to clonal propagation have reduced selection pressures on sexually reproductive functions in cassava leading to an accumulation of mutations in pollen-related genes. This relaxed selection and the genome-wide deleterious mutations responsible for inbreeding depression are potential targets for improving cassava breeding, where the generation of new varieties relies on recombining favorable alleles through sexual reproduction.
{"title":"Evolutionary signatures of the erosion of sexual reproduction genes in domesticated cassava (Manihot esculenta).","authors":"Evan M Long, Michelle C Stitzer, Brandon Monier, Aimee J Schulz, Maria Cinta Romay, Kelly R Robbins, Edward S Buckler","doi":"10.1093/g3journal/jkae282","DOIUrl":"10.1093/g3journal/jkae282","url":null,"abstract":"<p><p>Centuries of clonal propagation in cassava (Manihot esculenta) have reduced sexual recombination, leading to the accumulation of deleterious mutations. This has resulted in both inbreeding depression affecting yield and a significant decrease in reproductive performance, creating hurdles for contemporary breeding programs. Cassava is a member of the Euphorbiaceae family, including notable species such as rubber tree (Hevea brasiliensis) and poinsettia (Euphorbia pulcherrima). Expanding upon preliminary draft genomes, we annotated 7 long-read genome assemblies and aligned a total of 52 genomes, to analyze selection across the genome and the phylogeny. Through this comparative genomic approach, we identified 48 genes under relaxed selection in cassava. Notably, we discovered an overrepresentation of floral expressed genes, especially focused at 6 pollen-related genes. Our results indicate that domestication and a transition to clonal propagation have reduced selection pressures on sexually reproductive functions in cassava leading to an accumulation of mutations in pollen-related genes. This relaxed selection and the genome-wide deleterious mutations responsible for inbreeding depression are potential targets for improving cassava breeding, where the generation of new varieties relies on recombining favorable alleles through sexual reproduction.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11797036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1093/g3journal/jkae285
Chelsea Tafoya, Brandon Ching, Elva Garcia, Alyssa Lee, Melissa Acevedo, Kelsey Bass, Elizabeth Chau, Heidi Lin, Kaitlyn Mamora, Michael Reeves, Madyllyne Vaca, William van Iderstein, Luis Velasco, Vivianna Williams, Grant Yonemoto, Tyler Yonemoto, Danielle M Heller, Arturo Diaz
The genome sequences of thousands of bacteriophages have been determined and functions for many of the encoded genes have been assigned based on homology to characterized sequences. However, functions have not been assigned to more than two-thirds of the identified phage genes as they have no recognizable sequence features. Recent genome-wide overexpression screens have begun to identify bacteriophage genes that encode proteins that reduce or inhibit bacterial growth. This study describes the construction of a plasmid-based overexpression library of 76 genes encoded by Cluster K1 mycobacteriophage Amelie, which is genetically similar to cluster K phages Waterfoul and Hammy recently described in similar screens and closely related to phages that infect clinically important mycobacteria. Twenty-six out of the 76 genes evaluated in our screen, encompassing 34% of the genome, reduced growth of the host Mycobacterium smegmatis to various degrees. More than one-third of these 26 toxic genes have no known function, and 10 of the 26 genes almost completely abolished host growth upon overexpression. Notably, while several of the toxic genes identified in Amelie shared homologs with other Cluster K phages recently screened, this study uncovered 7 previously unknown gene families that exhibit cytotoxic properties, thereby broadening the repertoire of known phage-encoded growth inhibitors. This work, carried out under the HHMI-supported SEA-GENES project (Science Education Alliance Gene-function Exploration by a Network of Emerging Scientists), underscores the importance of comprehensive overexpression screens in elucidating genome-wide patterns of phage gene function and novel interactions between phages and their hosts.
{"title":"Genome-wide screen overexpressing mycobacteriophage Amelie genes identifies multiple inhibitors of mycobacterial growth.","authors":"Chelsea Tafoya, Brandon Ching, Elva Garcia, Alyssa Lee, Melissa Acevedo, Kelsey Bass, Elizabeth Chau, Heidi Lin, Kaitlyn Mamora, Michael Reeves, Madyllyne Vaca, William van Iderstein, Luis Velasco, Vivianna Williams, Grant Yonemoto, Tyler Yonemoto, Danielle M Heller, Arturo Diaz","doi":"10.1093/g3journal/jkae285","DOIUrl":"10.1093/g3journal/jkae285","url":null,"abstract":"<p><p>The genome sequences of thousands of bacteriophages have been determined and functions for many of the encoded genes have been assigned based on homology to characterized sequences. However, functions have not been assigned to more than two-thirds of the identified phage genes as they have no recognizable sequence features. Recent genome-wide overexpression screens have begun to identify bacteriophage genes that encode proteins that reduce or inhibit bacterial growth. This study describes the construction of a plasmid-based overexpression library of 76 genes encoded by Cluster K1 mycobacteriophage Amelie, which is genetically similar to cluster K phages Waterfoul and Hammy recently described in similar screens and closely related to phages that infect clinically important mycobacteria. Twenty-six out of the 76 genes evaluated in our screen, encompassing 34% of the genome, reduced growth of the host Mycobacterium smegmatis to various degrees. More than one-third of these 26 toxic genes have no known function, and 10 of the 26 genes almost completely abolished host growth upon overexpression. Notably, while several of the toxic genes identified in Amelie shared homologs with other Cluster K phages recently screened, this study uncovered 7 previously unknown gene families that exhibit cytotoxic properties, thereby broadening the repertoire of known phage-encoded growth inhibitors. This work, carried out under the HHMI-supported SEA-GENES project (Science Education Alliance Gene-function Exploration by a Network of Emerging Scientists), underscores the importance of comprehensive overexpression screens in elucidating genome-wide patterns of phage gene function and novel interactions between phages and their hosts.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11797047/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1093/g3journal/jkae279
Kenneth Lu, Deniz Erezyilmaz
Secondary contact between incompletely isolated species can produce a wide variety of outcomes. The vinegar flies Drosophila simulans and D. sechellia diverged on islands in the Indian Ocean and are currently separated by partial pre- and postzygotic barriers. The recent discovery of hybridization between D. simulans and D. sechellia in the wild presents an opportunity to monitor the prevalence of alleles that influence hybridization between these sibling species. We therefore sought to identify those loci in females that affect interspecific mating, and we adapted a two-choice assay to capture female mate choice and female attractiveness simultaneously. We used shotgun sequencing to genotype female progeny of reciprocal F1 backcrosses at high resolution and performed QTL analysis. We found 2 major-effect QTL in both backcrosses, one on either arm of the third chromosome that each account for 32-37% of the difference in phenotype between species. The QTL of both backcrosses overlap and may each be alternate alleles of the same locus. Genotypes at these 2 loci followed an assortative mating pattern with D. simulans males but not D. sechellia males, which mated most frequently with females that were hybrid at both loci. These data reveal how different allele combinations at 2 major loci may promote isolation and hybridization in the same species pair. Identification of these QTLs is an important step toward understanding how the genetic architecture of mate selection may shape the outcome of secondary contact.
{"title":"Two major-effect loci influence interspecific mating in females of the sibling species, Drosophila simulans and D. sechellia.","authors":"Kenneth Lu, Deniz Erezyilmaz","doi":"10.1093/g3journal/jkae279","DOIUrl":"10.1093/g3journal/jkae279","url":null,"abstract":"<p><p>Secondary contact between incompletely isolated species can produce a wide variety of outcomes. The vinegar flies Drosophila simulans and D. sechellia diverged on islands in the Indian Ocean and are currently separated by partial pre- and postzygotic barriers. The recent discovery of hybridization between D. simulans and D. sechellia in the wild presents an opportunity to monitor the prevalence of alleles that influence hybridization between these sibling species. We therefore sought to identify those loci in females that affect interspecific mating, and we adapted a two-choice assay to capture female mate choice and female attractiveness simultaneously. We used shotgun sequencing to genotype female progeny of reciprocal F1 backcrosses at high resolution and performed QTL analysis. We found 2 major-effect QTL in both backcrosses, one on either arm of the third chromosome that each account for 32-37% of the difference in phenotype between species. The QTL of both backcrosses overlap and may each be alternate alleles of the same locus. Genotypes at these 2 loci followed an assortative mating pattern with D. simulans males but not D. sechellia males, which mated most frequently with females that were hybrid at both loci. These data reveal how different allele combinations at 2 major loci may promote isolation and hybridization in the same species pair. Identification of these QTLs is an important step toward understanding how the genetic architecture of mate selection may shape the outcome of secondary contact.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11797031/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1093/g3journal/jkae280
Neil F Thompson, Ben J G Sutherland, Timothy J Green, Thomas A Delomas
Amplicon panels using genotyping by sequencing methods are now common, but have focused on characterizing SNP markers. We investigate how microhaplotype (MH) discovery within a recently developed Pacific oyster (Magallana gigas) amplicon panel could increase the statistical power for relationship assignment. Trios (offspring and two parents) from three populations in a newly established breeding program were genotyped on a 592 locus panel. After processing, 92% of retained amplicons contained polymorphic MH variants and 85% of monomorphic SNP markers contained MH variation. The increased allelic richness resulted in substantially improved power for relationship assignment with much lower estimated false positive rates. No substantive differences in assignment accuracy occurred between SNP and MH datasets, but using MHs increased the separation in log-likelihood values between true parents and highly related potential parents (aunts and uncles). A high number of Mendelian incompatibilities among trios were observed, likely due to null alleles. Further development of a MH panel, including removing loci with high rates of null alleles, would enable high-throughput genotyping by reducing panel size and therefore cost for Pacific oyster research and breeding programs.
{"title":"A free lunch: microhaplotype discovery in an existing amplicon panel improves parentage assignment for the highly polymorphic Pacific oyster.","authors":"Neil F Thompson, Ben J G Sutherland, Timothy J Green, Thomas A Delomas","doi":"10.1093/g3journal/jkae280","DOIUrl":"10.1093/g3journal/jkae280","url":null,"abstract":"<p><p>Amplicon panels using genotyping by sequencing methods are now common, but have focused on characterizing SNP markers. We investigate how microhaplotype (MH) discovery within a recently developed Pacific oyster (Magallana gigas) amplicon panel could increase the statistical power for relationship assignment. Trios (offspring and two parents) from three populations in a newly established breeding program were genotyped on a 592 locus panel. After processing, 92% of retained amplicons contained polymorphic MH variants and 85% of monomorphic SNP markers contained MH variation. The increased allelic richness resulted in substantially improved power for relationship assignment with much lower estimated false positive rates. No substantive differences in assignment accuracy occurred between SNP and MH datasets, but using MHs increased the separation in log-likelihood values between true parents and highly related potential parents (aunts and uncles). A high number of Mendelian incompatibilities among trios were observed, likely due to null alleles. Further development of a MH panel, including removing loci with high rates of null alleles, would enable high-throughput genotyping by reducing panel size and therefore cost for Pacific oyster research and breeding programs.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11797050/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1093/g3journal/jkaf021
Christopher Faulk, Arthur V Ribeiro, Carrie Walls, Robert L Koch
The soybean tentiform leafminer, Macrosaccus morrisella (Fitch) (Lepidoptera: Gracillariidae), is native to North America where it was known to feed on American hogpeanut and slickseed fuzzybean. However, it has recently expanded its host range to include soybean, an important agricultural crop. Here we report a new, highly contiguous genome for this species with a length of 245 Mb, N50 of 9 Mb, and 96.33% BUSCO completeness. The mitochondrial genome shares only 81% identity to its nearest relative in the NCBI nucleotide database indicating long-standing divergence or sparse sequencing in this clade. To determine whether host plant choice is genetically driven, we sequenced 18 individuals across three locations in Minnesota, United States, collected from both American hogpeanut and soybean plants. Genetic variation did not correlate with population structure based on either geography or host plant species (weighted FST estimate: 0.0058). As a secondary measure, we independently assembled complete mitochondrial genomes from all individuals and observed no delineation between host or location. Overall lack of detectable population structure at the nuclear and mitochondrial genome levels suggests a large population with flexible dietary preferences and does not show evidence of genetically driven host preference.
{"title":"The genome sequence and genomic diversity of soybean tentiform leafminer (Macrosaccus morrisella).","authors":"Christopher Faulk, Arthur V Ribeiro, Carrie Walls, Robert L Koch","doi":"10.1093/g3journal/jkaf021","DOIUrl":"https://doi.org/10.1093/g3journal/jkaf021","url":null,"abstract":"<p><p>The soybean tentiform leafminer, Macrosaccus morrisella (Fitch) (Lepidoptera: Gracillariidae), is native to North America where it was known to feed on American hogpeanut and slickseed fuzzybean. However, it has recently expanded its host range to include soybean, an important agricultural crop. Here we report a new, highly contiguous genome for this species with a length of 245 Mb, N50 of 9 Mb, and 96.33% BUSCO completeness. The mitochondrial genome shares only 81% identity to its nearest relative in the NCBI nucleotide database indicating long-standing divergence or sparse sequencing in this clade. To determine whether host plant choice is genetically driven, we sequenced 18 individuals across three locations in Minnesota, United States, collected from both American hogpeanut and soybean plants. Genetic variation did not correlate with population structure based on either geography or host plant species (weighted FST estimate: 0.0058). As a secondary measure, we independently assembled complete mitochondrial genomes from all individuals and observed no delineation between host or location. Overall lack of detectable population structure at the nuclear and mitochondrial genome levels suggests a large population with flexible dietary preferences and does not show evidence of genetically driven host preference.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143122617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1093/g3journal/jkaf020
Nicolas S Locatelli, Iliana B Baums
Coral populations worldwide are declining rapidly due to elevated ocean temperatures and other human impacts. The Caribbean harbors a high number of threatened, endangered, and critically endangered coral species compared to reefs of the larger Indo-Pacific. The reef corals of the Caribbean are also long diverged from their Pacific counterparts and may have evolved different survival strategies. Most genomic resources have been developed for Pacific coral species which may impede our ability to study the changes in genetic composition of Caribbean reef communities in response to global change. To help fill the gap in genomic resources, we used PacBio HiFi sequencing to generate the first genome assemblies for three Caribbean, reef-building corals, Colpophyllia natans, Dendrogyra cylindrus, and Siderastrea siderea. We also explore the genomic novelties that shape scleractinian genomes. Notably, we find abundant gene duplications of all classes (e.g., tandem and segmental), especially in S. siderea. This species has one of the largest genomes of any scleractinian coral (822Mb) which seems to be driven by repetitive content and gene family expansion and diversification. As the genome size of S. siderea was double the size expected of stony corals, we also evaluated the possibility of an ancient whole genome duplication using Ks tests and found no evidence of such an event in the species. By presenting these genome assemblies, we hope to develop a better understanding of coral evolution as a whole and to enable researchers to further investigate the population genetics and diversity of these three species.
{"title":"Genomes of the Caribbean reef-building corals Colpophyllia natans, Dendrogyra cylindrus, and Siderastrea siderea.","authors":"Nicolas S Locatelli, Iliana B Baums","doi":"10.1093/g3journal/jkaf020","DOIUrl":"10.1093/g3journal/jkaf020","url":null,"abstract":"<p><p>Coral populations worldwide are declining rapidly due to elevated ocean temperatures and other human impacts. The Caribbean harbors a high number of threatened, endangered, and critically endangered coral species compared to reefs of the larger Indo-Pacific. The reef corals of the Caribbean are also long diverged from their Pacific counterparts and may have evolved different survival strategies. Most genomic resources have been developed for Pacific coral species which may impede our ability to study the changes in genetic composition of Caribbean reef communities in response to global change. To help fill the gap in genomic resources, we used PacBio HiFi sequencing to generate the first genome assemblies for three Caribbean, reef-building corals, Colpophyllia natans, Dendrogyra cylindrus, and Siderastrea siderea. We also explore the genomic novelties that shape scleractinian genomes. Notably, we find abundant gene duplications of all classes (e.g., tandem and segmental), especially in S. siderea. This species has one of the largest genomes of any scleractinian coral (822Mb) which seems to be driven by repetitive content and gene family expansion and diversification. As the genome size of S. siderea was double the size expected of stony corals, we also evaluated the possibility of an ancient whole genome duplication using Ks tests and found no evidence of such an event in the species. By presenting these genome assemblies, we hope to develop a better understanding of coral evolution as a whole and to enable researchers to further investigate the population genetics and diversity of these three species.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143074378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1093/g3journal/jkaf018
Somya Mehra, Daniel E Neafsey, Michael White, Aimee R Taylor
Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterise systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.
{"title":"Systematic bias in malaria parasite relatedness estimation.","authors":"Somya Mehra, Daniel E Neafsey, Michael White, Aimee R Taylor","doi":"10.1093/g3journal/jkaf018","DOIUrl":"https://doi.org/10.1093/g3journal/jkaf018","url":null,"abstract":"<p><p>Genetic studies of Plasmodium parasites increasingly feature relatedness estimates. However, various aspects of malaria parasite relatedness estimation are not fully understood. For example, relatedness estimates based on whole-genome-sequence (WGS) data often exceed those based on sparser data types. Systematic bias in relatedness estimation is well documented in the literature geared towards diploid organisms, but largely unknown within the malaria community. We characterise systematic bias in malaria parasite relatedness estimation using three complementary approaches: theoretically, under a non-ancestral statistical model of pairwise relatedness; numerically, under a simulation model of ancestry; and empirically, using data on parasites sampled from Guyana and Colombia. We show that allele frequency estimates encode, locus-by-locus, relatedness averaged over the set of sampled parasites used to compute them. Plugging sample allele frequencies into models of pairwise relatedness can lead to systematic underestimation. However, systematic underestimation can be viewed as population-relatedness calibration, i.e., a way of generating measures of relative relatedness. Systematic underestimation is unavoidable when relatedness is estimated assuming independence between genetic markers. It is mitigated when relatedness is estimated using WGS data under a hidden Markov model (HMM) that exploits linkage between proximal markers. The extent of mitigation is unknowable when a HMM is fit to sparser data, but downstream analyses that use high relatedness thresholds are relatively robust regardless. In summary, practitioners can either resolve to use relative relatedness estimated under independence, or try to estimate absolute relatedness under a HMM. We propose various tools to help practitioners evaluate their situation on a case-by-case basis.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-30DOI: 10.1093/g3journal/jkaf019
Vivak Soni, Jeffrey D Jensen
The demographic history of a population, and the distribution of fitness effects (DFE) of newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation and evolutionary trajectories. Although both demographic and DFE inference has been performed extensively in humans, these approaches have generally either been limited to simple demographic models involving a single population, or, where a complex population history has been inferred, without accounting for the potentially confounding effects of selection at linked sites. Taking advantage of the coding-sparse nature of the genome, we propose a 2-step approach in which coalescent simulations are first used to infer a complex multi-population demographic model, utilizing large non-functional regions that are likely free from the effects of background selection. We then use forward-in-time simulations to perform DFE inference in functional regions, conditional on the complex demography inferred and utilizing expected background selection effects in the estimation procedure. Throughout, recombination and mutation rate maps were used to account for the underlying empirical rate heterogeneity across the human genome. Importantly, within this framework it is possible to utilize and fit multiple aspects of the data, and this inference scheme represents a generalized approach for such large-scale inference in species with coding-sparse genomes.
{"title":"Inferring demographic and selective histories from population genomic data using a two-step approach in species with coding-sparse genomes: an application to human data.","authors":"Vivak Soni, Jeffrey D Jensen","doi":"10.1093/g3journal/jkaf019","DOIUrl":"10.1093/g3journal/jkaf019","url":null,"abstract":"<p><p>The demographic history of a population, and the distribution of fitness effects (DFE) of newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation and evolutionary trajectories. Although both demographic and DFE inference has been performed extensively in humans, these approaches have generally either been limited to simple demographic models involving a single population, or, where a complex population history has been inferred, without accounting for the potentially confounding effects of selection at linked sites. Taking advantage of the coding-sparse nature of the genome, we propose a 2-step approach in which coalescent simulations are first used to infer a complex multi-population demographic model, utilizing large non-functional regions that are likely free from the effects of background selection. We then use forward-in-time simulations to perform DFE inference in functional regions, conditional on the complex demography inferred and utilizing expected background selection effects in the estimation procedure. Throughout, recombination and mutation rate maps were used to account for the underlying empirical rate heterogeneity across the human genome. Importantly, within this framework it is possible to utilize and fit multiple aspects of the data, and this inference scheme represents a generalized approach for such large-scale inference in species with coding-sparse genomes.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-28DOI: 10.1093/g3journal/jkaf014
Débora Y C Brandt, Oscar H Del Brutto, Rasmus Nielsen
Atahualpa is a rural village located in coastal Ecuador, a region that has been inhabited by people as early as 10,000 years ago. The traditional diet of their indigenous inhabitants is rich in oily fish and they have, therefore, served as a model for investigating the beneficial effects of such a diet. However, the genetic background of this population has not been studied. In this study, we sequenced the genomes of Atahualpa residents to look for variants under natural selection, which could mediate the effects of oily fish intake. DNA was extracted from 50 blood samples from randomly selected individuals recruited in the Atahualpa Project Cohort. After applying various filters, we calculated genome-wide genotype likelihoods from 33 samples, and combined data from those samples with data from other populations to investigate how the Atahualpa population is genetically related to these populations. Using selection scans, we identified signals of natural selection that may explain the above-mentioned dietary effects. The genetic ancestry in Atahualpa residents is 94.1% of indigenous American origin, but is substantially diverged from other indigenous populations in neighboring countries. Significant signatures of natural selection were found in the Atahualpa population, including a broad selection signal around the SUFU gene, which is a repressor of Hedgehog pathway signaling and associated with lipid metabolism, and another signal in the upstream region of LRP1B which encodes low-density lipoprotein (LDL) receptor related protein 1B. Our selection study reveals genes under selection in the Atahualpa population, which could mediate the beneficial effects of oily fish intake in this population.
{"title":"Signatures of natural selection may indicate a genetic basis for the beneficial effects of oily fish intake in indigenous people from coastal Ecuador.","authors":"Débora Y C Brandt, Oscar H Del Brutto, Rasmus Nielsen","doi":"10.1093/g3journal/jkaf014","DOIUrl":"10.1093/g3journal/jkaf014","url":null,"abstract":"<p><p>Atahualpa is a rural village located in coastal Ecuador, a region that has been inhabited by people as early as 10,000 years ago. The traditional diet of their indigenous inhabitants is rich in oily fish and they have, therefore, served as a model for investigating the beneficial effects of such a diet. However, the genetic background of this population has not been studied. In this study, we sequenced the genomes of Atahualpa residents to look for variants under natural selection, which could mediate the effects of oily fish intake. DNA was extracted from 50 blood samples from randomly selected individuals recruited in the Atahualpa Project Cohort. After applying various filters, we calculated genome-wide genotype likelihoods from 33 samples, and combined data from those samples with data from other populations to investigate how the Atahualpa population is genetically related to these populations. Using selection scans, we identified signals of natural selection that may explain the above-mentioned dietary effects. The genetic ancestry in Atahualpa residents is 94.1% of indigenous American origin, but is substantially diverged from other indigenous populations in neighboring countries. Significant signatures of natural selection were found in the Atahualpa population, including a broad selection signal around the SUFU gene, which is a repressor of Hedgehog pathway signaling and associated with lipid metabolism, and another signal in the upstream region of LRP1B which encodes low-density lipoprotein (LDL) receptor related protein 1B. Our selection study reveals genes under selection in the Atahualpa population, which could mediate the beneficial effects of oily fish intake in this population.</p>","PeriodicalId":12468,"journal":{"name":"G3: Genes|Genomes|Genetics","volume":" ","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143052017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}