Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, favored for their low cost and high accuracy. However, these methods often produce fragmented draft genomes, hindering comprehensive bacterial function analysis. CycloneSEQ, a novel long-read sequencing platform developed by BGI-Research, its sequencing performance and assembly improvements has been evaluated. Using CycloneSEQ long-read sequencing, the type strain produced long reads with an average length of 11.6 kbp and an average quality score of 14.4. After hybrid assembly with short-read data, the assembled genome exhibited an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method was validated across 9 diverse species, successfully assembling complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long reads to fill gaps and accurately assemble multi-copy rRNA genes, which unable be achieved by short reads solely. Through data subsampling, we found that over 500 Mbp of short-read data combined with 100 Mbp of long-read data can result in a high-quality circular assembly. Additionally, using CycloneSEQ long reads effectively improves the assembly of circular complete genomes from mixed microbial communities. CycloneSEQ's read length is sufficient for circular bacterial genomes, but its base quality needs improvement. Integrating DNBSEQ short reads improved accuracy, resulting in complete and accurate assemblies. This efficient approach can be widely applied in microbial sequencing.
{"title":"Efficiently Constructing Complete Genomes with CycloneSEQ to Fill Gaps in Bacterial Draft Assemblies","authors":"Hewei Liang, Mengmeng Wang, Tongyuan Hu, Haoyu Wang, Wenxin He, Yanmei Ju, Ruijin Guo, Junyi Chen, Fei Guo, Tao Zeng, Yuliang Dong, Bo Wang, Chuanyu Liu, Xin Jin, Wenwei Zhang, Yuanqiang Zou, Xun Xu, Liang Xiao","doi":"10.1101/2024.09.05.611410","DOIUrl":"https://doi.org/10.1101/2024.09.05.611410","url":null,"abstract":"Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, favored for their low cost and high accuracy. However, these methods often produce fragmented draft genomes, hindering comprehensive bacterial function analysis. CycloneSEQ, a novel long-read sequencing platform developed by BGI-Research, its sequencing performance and assembly improvements has been evaluated. Using CycloneSEQ long-read sequencing, the type strain produced long reads with an average length of 11.6 kbp and an average quality score of 14.4. After hybrid assembly with short-read data, the assembled genome exhibited an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method was validated across 9 diverse species, successfully assembling complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long reads to fill gaps and accurately assemble multi-copy rRNA genes, which unable be achieved by short reads solely. Through data subsampling, we found that over 500 Mbp of short-read data combined with 100 Mbp of long-read data can result in a high-quality circular assembly. Additionally, using CycloneSEQ long reads effectively improves the assembly of circular complete genomes from mixed microbial communities. CycloneSEQ's read length is sufficient for circular bacterial genomes, but its base quality needs improvement. Integrating DNBSEQ short reads improved accuracy, resulting in complete and accurate assemblies. This efficient approach can be widely applied in microbial sequencing.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-07DOI: 10.1101/2024.09.06.611692
Asier Fernandez-Pato, Trishla Sinha, Sanzhima Garmaeva, Anastasia Gulyaeva, Nataliia Kuzub, Simon Roux, Jingyuan Fu, Alexander Kurilshikov, Alexandra Zhernakova
As mobile genetic elements (MGE) are critical yet understudied determinants of gut microbiome composition, we characterized the gut virome and plasmidome in 195 samples from 28 mother-infant dyads delivered by caesarean section. Infant mobilome increased in richness over the first 6 postnatal weeks, demonstrating high individual-specificity and temporal stability, establishing a personal persistent mobilome. Formula-fed infants exhibited greater mobilome richness than breastfed infants, with plasmid composition influenced by antibiotic exposure and birth weight. Plasmids constituted a significant reservoir of antibiotic resistance genes (ARG), with around 5% of infant gut plasmid taxonomic units carrying ARG. Notably, ARG profiles did not differ with antibiotic exposure at birth. We found that mother-infant sharing of viral and plasmid strains primarily occurred after 6 months of age. Overall, our integrative analysis offers novel insights into the dynamics, modulation, origin, and clinical implications of MGE in the developing gut microbiome.
{"title":"Dynamics and determinants of the gut mobilome in early life","authors":"Asier Fernandez-Pato, Trishla Sinha, Sanzhima Garmaeva, Anastasia Gulyaeva, Nataliia Kuzub, Simon Roux, Jingyuan Fu, Alexander Kurilshikov, Alexandra Zhernakova","doi":"10.1101/2024.09.06.611692","DOIUrl":"https://doi.org/10.1101/2024.09.06.611692","url":null,"abstract":"As mobile genetic elements (MGE) are critical yet understudied determinants of gut microbiome composition, we characterized the gut virome and plasmidome in 195 samples from 28 mother-infant dyads delivered by caesarean section. Infant mobilome increased in richness over the first 6 postnatal weeks, demonstrating high individual-specificity and temporal stability, establishing a personal persistent mobilome. Formula-fed infants exhibited greater mobilome richness than breastfed infants, with plasmid composition influenced by antibiotic exposure and birth weight. Plasmids constituted a significant reservoir of antibiotic resistance genes (ARG), with around 5% of infant gut plasmid taxonomic units carrying ARG. Notably, ARG profiles did not differ with antibiotic exposure at birth. We found that mother-infant sharing of viral and plasmid strains primarily occurred after 6 months of age. Overall, our integrative analysis offers novel insights into the dynamics, modulation, origin, and clinical implications of MGE in the developing gut microbiome.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-07DOI: 10.1101/2024.09.03.611127
Eyes S Robson, Nilah M. Ioannidis
The nascent field of genomic AI is rapidly expanding with new models, benchmarks, and findings. As the field diversifies, there is an increased need for a common set of measurement tools and perspectives to standardize model evaluation. Here, we present a statistically grounded framework for performance evaluation, visualization, and interpretation using the prominent genomic AI model Enformer as a case study. The Enformer model has been used for a range of applications from mechanism discovery to variant effect prediction, but what makes it better or worse than precedent models at particular tasks? Our goal is not merely to answer these questions for Enformer, but to propose how we should think about new models in general. We start by reporting Enformer's few-shot performance on the GUANinE benchmark, which emphasizes complex genome interpretation tasks, and discuss its gains and deficits compared to precedent models. We follow this analysis with visualizations of Enformer's embeddings in low-dimensional space, where, among other insights, we diagnose features of the embeddings that may limit model generalization to synthetic biology tasks. Finally, we present a novel, theory-backed probe of Enformer embeddings, where variance decomposition allows for holistic interpretation and partial 'backtracking' to explanatory causal features. Through this case study, we illustrate a new framework, Enformation Theory, for analyzing and interpreting genomic AI models.
{"title":"Enformation Theory: A Framework for Evaluating Genomic AI","authors":"Eyes S Robson, Nilah M. Ioannidis","doi":"10.1101/2024.09.03.611127","DOIUrl":"https://doi.org/10.1101/2024.09.03.611127","url":null,"abstract":"The nascent field of genomic AI is rapidly expanding with new models, benchmarks, and findings. As the field diversifies, there is an increased need for a common set of measurement tools and perspectives to standardize model evaluation. Here, we present a statistically grounded framework for performance evaluation, visualization, and interpretation using the prominent genomic AI model Enformer as a case study. The Enformer model has been used for a range of applications from mechanism discovery to variant effect prediction, but what makes it better or worse than precedent models at particular tasks? Our goal is not merely to answer these questions for Enformer, but to propose how we should think about new models in general. We start by reporting Enformer's few-shot performance on the GUANinE benchmark, which emphasizes complex genome interpretation tasks, and discuss its gains and deficits compared to precedent models. We follow this analysis with visualizations of Enformer's embeddings in low-dimensional space, where, among other insights, we diagnose features of the embeddings that may limit model generalization to synthetic biology tasks. Finally, we present a novel, theory-backed probe of Enformer embeddings, where variance decomposition allows for holistic interpretation and partial 'backtracking' to explanatory causal features. Through this case study, we illustrate a new framework, Enformation Theory, for analyzing and interpreting genomic AI models.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1101/2024.09.06.611573
Jinyoung Kim, Ryan Y. Muller, Eliana R Bondra, Nicholas Ingolia
Genome-wide CRISPR screens have emerged as powerful tools for uncovering the genetic underpinnings of diverse biological processes. Incisive screens often depend on directly measuring molecular phenotypes, such as regulated gene expression changes, provoked by CRISPR-mediated genetic perturbations. Here, we provide quantitative measurements of transcriptional responses in human cells across genome-scale perturbation libraries by coupling CRISPR interference (CRISPRi) with barcoded expression reporter sequencing (CiBER-seq). To enable CiBER-seq in mammalian cells, we optimize the integration of highly complex, barcoded sgRNA libraries into a defined genomic context. CiBER-seq profiling of a nuclear factor kappa B (NF-kappaB) reporter delineates the canonical signaling cascade linking the transmembrane TNF- alpha receptor to inflammatory gene activation and highlights cell-type-specific factors in this response. Importantly, CiBER-seq relies solely on bulk RNA sequencing to capture the regulatory circuit driving this rapid transcriptional response. Our work demonstrates the accuracy of CiBER-seq and its potential for dissecting genetic networks in mammalian cells with superior time resolution.
{"title":"CRISPRi with barcoded expression reporters dissects regulatory networks in human cells","authors":"Jinyoung Kim, Ryan Y. Muller, Eliana R Bondra, Nicholas Ingolia","doi":"10.1101/2024.09.06.611573","DOIUrl":"https://doi.org/10.1101/2024.09.06.611573","url":null,"abstract":"Genome-wide CRISPR screens have emerged as powerful tools for uncovering the genetic underpinnings of diverse biological processes. Incisive screens often depend on directly measuring molecular phenotypes, such as regulated gene expression changes, provoked by CRISPR-mediated genetic perturbations. Here, we provide quantitative measurements of transcriptional responses in human cells across genome-scale perturbation libraries by coupling CRISPR interference (CRISPRi) with barcoded expression reporter sequencing (CiBER-seq). To enable CiBER-seq in mammalian cells, we optimize the integration of highly complex, barcoded sgRNA libraries into a defined genomic context. CiBER-seq profiling of a nuclear factor kappa B (NF-kappaB) reporter delineates the canonical signaling cascade linking the transmembrane TNF- alpha receptor to inflammatory gene activation and highlights cell-type-specific factors in this response. Importantly, CiBER-seq relies solely on bulk RNA sequencing to capture the regulatory circuit driving this rapid transcriptional response. Our work demonstrates the accuracy of CiBER-seq and its potential for dissecting genetic networks in mammalian cells with superior time resolution.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1101/2024.09.04.610850
Brandon M. Wenz, Yuan He, Nae-Chyun Chen, Joseph K. Pickrell, Jeremiah H Li, Max F. Dudek, Taibo Li, Rebecca Keener, Benjamin F. Voight, Christopher D. Brown, Alexis Battle
Background Understanding the genetic causes for variability in chromatin accessibility can shed light on the molecular mechanisms through which genetic variants may affect complex traits. Thousands of ATAC-seq samples have been collected that hold information about chromatin accessibility across diverse cell types and contexts, but most of these are not paired with genetic information and come from diverse distinct projects and laboratories. Results We report here joint genotyping, chromatin accessibility peak calling, and discovery of quantitative trait loci which influence chromatin accessibility (caQTLs), demonstrating the capability of performing caQTL analysis on a large scale in a diverse sample set without pre-existing genotype information. Using 10,293 profiling samples representing 1,454 unique donor individuals across 653 studies from public databases, we catalog 23,381 caQTLs in total. After joint discovery analysis, we cluster samples based on accessible chromatin profiles to identify context-specific caQTLs. We find that caQTLs are strongly enriched for annotations of gene regulatory elements across diverse cell types and tissues and are often strongly linked with genetic variation associated with changes in expression (eQTLs), indicating that caQTLs can mediate genetic effects on gene expression. We demonstrate sharing of causal variants for chromatin accessibility and diverse complex human traits, enabling a more complete picture of the genetic mechanisms underlying complex human phenotypes. Conclusions Our work provides a proof of principle for caQTL calling from previously ungenotyped samples, and represents one of the largest, most diverse caQTL resources currently available, informing mechanisms of genetic regulation of gene expression and contribution to disease.
{"title":"Genotype inference from aggregated chromatin accessibility data reveals genetic regulatory mechanisms","authors":"Brandon M. Wenz, Yuan He, Nae-Chyun Chen, Joseph K. Pickrell, Jeremiah H Li, Max F. Dudek, Taibo Li, Rebecca Keener, Benjamin F. Voight, Christopher D. Brown, Alexis Battle","doi":"10.1101/2024.09.04.610850","DOIUrl":"https://doi.org/10.1101/2024.09.04.610850","url":null,"abstract":"Background\u0000Understanding the genetic causes for variability in chromatin accessibility can shed light on the molecular mechanisms through which genetic variants may affect complex traits. Thousands of ATAC-seq samples have been collected that hold information about chromatin accessibility across diverse cell types and contexts, but most of these are not paired with genetic information and come from diverse distinct projects and laboratories. Results\u0000We report here joint genotyping, chromatin accessibility peak calling, and discovery of quantitative trait loci which influence chromatin accessibility (caQTLs), demonstrating the capability of performing caQTL analysis on a large scale in a diverse sample set without pre-existing genotype information. Using 10,293 profiling samples representing 1,454 unique donor individuals across 653 studies from public databases, we catalog 23,381 caQTLs in total. After joint discovery analysis, we cluster samples based on accessible chromatin profiles to identify context-specific caQTLs. We find that caQTLs are strongly enriched for annotations of gene regulatory elements across diverse cell types and tissues and are often strongly linked with genetic variation associated with changes in expression (eQTLs), indicating that caQTLs can mediate genetic effects on gene expression. We demonstrate sharing of causal variants for chromatin accessibility and diverse complex human traits, enabling a more complete picture of the genetic mechanisms underlying complex human phenotypes. Conclusions\u0000Our work provides a proof of principle for caQTL calling from previously ungenotyped samples, and represents one of the largest, most diverse caQTL resources currently available, informing mechanisms of genetic regulation of gene expression and contribution to disease.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1101/2024.06.19.599757
Anouk van Westerhoven, Jelmer Dijkstra, Luis Aznar Palop, Kyran Wissink, Jasper Bell, Gert Kema, Michael F. Seidl
Mitochondria are present in almost all eukaryotic lineages. The mitochondrial genomes (mitogenomes) evolve separately from nuclear genomes, and they can therefore provide relevant insights into the evolution of their host species. Fusarium oxysporum is a major fungal plant pathogen that is assumed to reproduce clonally. However, horizontal chromosome transfer between strains can occur through heterokaryon formation, and recently signs of sexual recombination have been observed. Similarly, signs of recombination in F. oxysporum mitogenomes challenged the prevailing assumption of clonal reproduction in this species. Here, we construct, to our knowledge, the first fungal pan-mitogenome graph of nearly 500 F. oxysporum mitogenome assemblies to uncover the variation and evolution. In general, the gene order of fungal mitogenomes is not well conserved, yet the mitogenome of F. oxysporum and related species are highly co-linear. We observed two strikingly contrasting regions in the Fusarium oxysporum pan-mitogenome, comprising a highly conserved core mitogenome and a long variable region (6-16 kb in size), of which we identified three distinct types. The pan-mitogenome graph reveals that only five intron insertions occurred in the core mitogenome and that the long variable regions drive the difference between mitogenomes. Moreover, we observed that their evolution is neither concurrent with the core mitogenome nor with the nuclear genome. Our large-scale analysis of long variable regions uncovers frequent recombination between mitogenomes, even between strains that belong to different taxonomic clades. This challenges the common assumption of incompatibility between genetically diverse F. oxysporum strains and provides new insights into the evolution of this fungal species.
线粒体存在于几乎所有真核生物系中。线粒体基因组(有丝分裂基因组)的进化与核基因组的进化是分开的,因此它们可以为宿主物种的进化提供相关信息。Fusarium oxysporum 是一种主要的植物真菌病原体,被认为会进行克隆繁殖。然而,菌株之间的水平染色体转移可以通过异核子的形成而发生,最近还观察到了有性重组的迹象。同样,在 F. oxysporum 有丝分裂基因组中出现的重组迹象也对该物种克隆繁殖的普遍假设提出了挑战。据我们所知,我们在这里构建了第一个真菌泛有丝分裂基因组图谱,包含了近 500 个有丝分裂真菌有丝分裂基因组,以揭示其变异和进化。一般来说,真菌有丝分裂基因组的基因顺序并不十分保守,然而氧孢子菌和相关物种的有丝分裂基因组却高度共线。我们观察到氧孢镰刀菌泛有丝分裂基因组中有两个截然不同的区域,包括一个高度保守的核心有丝分裂基因组和一个长的可变区域(大小为 6-16 kb),我们发现其中有三种不同的类型。泛有丝分裂基因组图显示,核心有丝分裂基因组中只有五个内含子插入,而长可变区则导致了有丝分裂基因组之间的差异。此外,我们观察到它们的进化既不与核心有丝分裂基因组同步,也不与核基因组同步。我们对长可变区的大规模分析发现了有丝分裂基因组之间频繁的重组,甚至在属于不同分类支系的菌株之间也是如此。这挑战了基因不同的 F. oxysporum 菌株之间不相容的普遍假设,并为该真菌物种的进化提供了新的见解。
{"title":"Frequent genetic exchanges revealed by a pan-mitogenome graph of a fungal plant pathogen","authors":"Anouk van Westerhoven, Jelmer Dijkstra, Luis Aznar Palop, Kyran Wissink, Jasper Bell, Gert Kema, Michael F. Seidl","doi":"10.1101/2024.06.19.599757","DOIUrl":"https://doi.org/10.1101/2024.06.19.599757","url":null,"abstract":"Mitochondria are present in almost all eukaryotic lineages. The mitochondrial genomes (mitogenomes) evolve separately from nuclear genomes, and they can therefore provide relevant insights into the evolution of their host species. Fusarium oxysporum is a major fungal plant pathogen that is assumed to reproduce clonally. However, horizontal chromosome transfer between strains can occur through heterokaryon formation, and recently signs of sexual recombination have been observed. Similarly, signs of recombination in F. oxysporum mitogenomes challenged the prevailing assumption of clonal reproduction in this species. Here, we construct, to our knowledge, the first fungal pan-mitogenome graph of nearly 500 F. oxysporum mitogenome assemblies to uncover the variation and evolution. In general, the gene order of fungal mitogenomes is not well conserved, yet the mitogenome of F. oxysporum and related species are highly co-linear. We observed two strikingly contrasting regions in the Fusarium oxysporum pan-mitogenome, comprising a highly conserved core mitogenome and a long variable region (6-16 kb in size), of which we identified three distinct types. The pan-mitogenome graph reveals that only five intron insertions occurred in the core mitogenome and that the long variable regions drive the difference between mitogenomes. Moreover, we observed that their evolution is neither concurrent with the core mitogenome nor with the nuclear genome. Our large-scale analysis of long variable regions uncovers frequent recombination between mitogenomes, even between strains that belong to different taxonomic clades. This challenges the common assumption of incompatibility between genetically diverse F. oxysporum strains and provides new insights into the evolution of this fungal species.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1101/2024.09.05.611382
Eleanor H Hayles, Andrew J Page, Javier Guitian, Robert A Kingsley, The COVID-19 Genomics UK Consortium, Gemma C Langridge
Background: In the UK, the COVID-19 Genomics UK Consortium (COG-UK) established a real time national genomic surveillance system during the COVID-19 pandemic, producing centralised data for monitoring SARS-CoV-2. As a COG-UK partner, Quadram Institute Bioscience (QIB) in Norfolk sequenced over 87,000 SARS-CoV-2 genomes, contributing to the region becoming densely sequenced. Retrospective analysis of SARS-CoV-2 lineage dynamics in this region may contribute to preparedness for future pandemics. Methods: 29,406 SARS-CoV-2 whole genome sequences and corresponding metadata from Norfolk were extracted from the COG-UK dataset, sampled between March 2020 and December 2022, representing 9.9% of regional COVID-19 cases. Sequences were lineage typed using Pangolin, and subsequent lineage analysis carried out in R using RStudio and related packages, including graphical analysis using ggplot2. Results: 401 global lineages were identified, with 69.8% appearing more than once and 31.2% over ten times. Temporal clustering identified six lineage communities based on first lineage emergence. Alpha, Delta, and Omicron variants of concern (VOC) accounted for 8.6%, 34.9% and 48.5% of sequences respectively. These formed four regional epidemic waves alongside the remaining lineages which appeared in the early pandemic prior to VOC designation and were termed pre-VOC lineages. Regional comparison highlighted variability in VOC epidemic wave dates dependent on location. Conclusion: This study is the first to assess SARS-CoV-2 diversity in Norfolk across a large timescale within the COVID-19 pandemic. SARS-CoV-2 was both highly diverse and dynamic throughout the Norfolk region between March 2020 – December 2022, with a strong VOC presence within the latter two thirds of the study period. The study also displays the utility of incorporating genomic epidemiological methods into pandemic response.
{"title":"Genomic Epidemiology of SARS-CoV-2 in Norfolk, UK, March 2020 – December 2022","authors":"Eleanor H Hayles, Andrew J Page, Javier Guitian, Robert A Kingsley, The COVID-19 Genomics UK Consortium, Gemma C Langridge","doi":"10.1101/2024.09.05.611382","DOIUrl":"https://doi.org/10.1101/2024.09.05.611382","url":null,"abstract":"Background: In the UK, the COVID-19 Genomics UK Consortium (COG-UK) established a real time national genomic surveillance system during the COVID-19 pandemic, producing centralised data for monitoring SARS-CoV-2. As a COG-UK partner, Quadram Institute Bioscience (QIB) in Norfolk sequenced over 87,000 SARS-CoV-2 genomes, contributing to the region becoming densely sequenced. Retrospective analysis of SARS-CoV-2 lineage dynamics in this region may contribute to preparedness for future pandemics. Methods: 29,406 SARS-CoV-2 whole genome sequences and corresponding metadata from Norfolk were extracted from the COG-UK dataset, sampled between March 2020 and December 2022, representing 9.9% of regional COVID-19 cases. Sequences were lineage typed using Pangolin, and subsequent lineage analysis carried out in R using RStudio and related packages, including graphical analysis using ggplot2. Results: 401 global lineages were identified, with 69.8% appearing more than once and 31.2% over ten times. Temporal clustering identified six lineage communities based on first lineage emergence. Alpha, Delta, and Omicron variants of concern (VOC) accounted for 8.6%, 34.9% and 48.5% of sequences respectively. These formed four regional epidemic waves alongside the remaining lineages which appeared in the early pandemic prior to VOC designation and were termed pre-VOC lineages. Regional comparison highlighted variability in VOC epidemic wave dates dependent on location. Conclusion: This study is the first to assess SARS-CoV-2 diversity in Norfolk across a large timescale within the COVID-19 pandemic. SARS-CoV-2 was both highly diverse and dynamic throughout the Norfolk region between March 2020 – December 2022, with a strong VOC presence within the latter two thirds of the study period. The study also displays the utility of incorporating genomic epidemiological methods into pandemic response.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1101/2024.09.05.611471
Benjamin J Hanrahan, Kirat Alreja, Andre L. M. Reis, J King Chang, Duminda S. B. Dissanayake, Richard J Edwards, Terry Bertozzi, Jillian M Hammond, Denis O'Meally, Ira W Deveson, Arthur Georges, Paul D. Waters, Hardip Rameshbhai Patel
The eastern three-lined skink (Bassiana duperreyi) inhabits the Australian high country in the southwest of the continent including Tasmania. It is an oviparous species that is distinctive because it undergoes sex reversal (from XX genotypic females to phenotypic males) at low incubation temperatures. We present a chromosome-scale genome assembly of a Bassiana duperreyi XY male individual, constructed using a combination of PacBio HiFi and ONT long reads scaffolded using Illumina HiC data. The genome assembly length is 1.57 Gb with a scaffold N50 of 222 Mbp, N90 of 26 Mbp, 200 gaps and 43.10% GC content. Most (95%) of the assembly is scaffolded into 6 macrochromosomes, 8 microchromosomes and the X chromosome, corresponding to the karyotype. Fragmented Y chromosome scaffolds (n=11 ≥1 Mbp) were identified using Y-specific contigs generated by genome subtraction. We identified two novel alpha-satellite repeats of 187 bp and 199 bp in the putative centromeres that did not form higher order repeats. The genome assembly exceeds the standard recommended by the Earth Biogenome Project; 0.02% false expansions, 99.63% kmer completeness, 94.66% complete single copy BUSCO genes and an average 98.42% of transcriptome data mappable to the genome assembly. The mitochondrial genome (17,506 bp) and the model rDNA repeat unit (15,154 bp) were assembled. The B. duperreyi genome assembly has one of the highest completeness levels for a skink and will provide a resource for research focused on sex determination and thermolabile sex reversal, as an oviparous foundation species for studies of the evolution of viviparity, and for other comparative genomics studies of the Scincidae.
东部三线石龙子(Bassiana duperreyi)栖息在澳大利亚大陆西南部的高原地区,包括塔斯马尼亚。它是一种卵生物种,在低温孵化条件下会发生性别逆转(从 XX 基因型的雌性变为表型的雄性),因而与众不同。我们展示了一个 Bassiana duperreyi XY 雄性个体的染色体级基因组组装,该组装是利用 Illumina HiC 数据结合 PacBio HiFi 和 ONT 长读数构建的。基因组组装长度为 1.57 Gb,支架 N50 为 222 Mbp,N90 为 26 Mbp,间隙为 200,GC 含量为 43.10%。大部分(95%)的装配支架分为 6 条大染色体、8 条小染色体和 X 染色体,与核型相对应。利用基因组减法生成的 Y 染色体特异性等位基因,鉴定出了破碎的 Y 染色体支架(n=11 ≥1 Mbp)。我们在推定的中心粒中发现了两个新的α-卫星重复序列,分别为187 bp和199 bp,它们没有形成高阶重复序列。基因组组装超过了地球生物基因组计划推荐的标准:0.02%的错误扩展、99.63%的kmer完整性、94.66%的完整单拷贝BUSCO基因以及平均98.42%的转录组数据可映射到基因组组装。线粒体基因组(17,506 bp)和模式 rDNA 重复单元(15,154 bp)已组装完成。B. duperreyi 的基因组组装是目前完成度最高的石龙子基因组组装之一,将为性别决定和热性反转研究提供资源,并可作为研究胎生进化的卵生基础物种,还可用于其他石龙子科动物的比较基因组学研究。
{"title":"A genome assembly and annotation for the Australian alpine skink Bassiana duperreyi using long-read technologies","authors":"Benjamin J Hanrahan, Kirat Alreja, Andre L. M. Reis, J King Chang, Duminda S. B. Dissanayake, Richard J Edwards, Terry Bertozzi, Jillian M Hammond, Denis O'Meally, Ira W Deveson, Arthur Georges, Paul D. Waters, Hardip Rameshbhai Patel","doi":"10.1101/2024.09.05.611471","DOIUrl":"https://doi.org/10.1101/2024.09.05.611471","url":null,"abstract":"The eastern three-lined skink (<em>Bassiana duperreyi</em>) inhabits the Australian high country in the southwest of the continent including Tasmania. It is an oviparous species that is distinctive because it undergoes sex reversal (from XX genotypic females to phenotypic males) at low incubation temperatures. We present a chromosome-scale genome assembly of a <em>Bassiana duperreyi</em> XY male individual, constructed using a combination of PacBio HiFi and ONT long reads scaffolded using Illumina HiC data. The genome assembly length is 1.57 Gb with a scaffold N50 of 222 Mbp, N90 of 26 Mbp, 200 gaps and 43.10% GC content. Most (95%) of the assembly is scaffolded into 6 macrochromosomes, 8 microchromosomes and the X chromosome, corresponding to the karyotype. Fragmented Y chromosome scaffolds (n=11 ≥1 Mbp) were identified using Y-specific contigs generated by genome subtraction. We identified two novel alpha-satellite repeats of 187 bp and 199 bp in the putative centromeres that did not form higher order repeats. The genome assembly exceeds the standard recommended by the Earth Biogenome Project; 0.02% false expansions, 99.63% kmer completeness, 94.66% complete single copy BUSCO genes and an average 98.42% of transcriptome data mappable to the genome assembly. The mitochondrial genome (17,506 bp) and the model rDNA repeat unit (15,154 bp) were assembled. The <em>B. duperreyi</em> genome assembly has one of the highest completeness levels for a skink and will provide a resource for research focused on sex determination and thermolabile sex reversal, as an oviparous foundation species for studies of the evolution of viviparity, and for other comparative genomics studies of the Scincidae.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1101/2024.09.04.611293
Siyan Liu, Marisa C Hamilton, Thomas N Cowart, Alejandro Barrera, Lexi R Bounds, Alexander C Nelson, Richard W Doty, Andrew S Allen, Gregory E Crawford, William H Majoros, Charles A. Gersbach
Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.
{"title":"Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER","authors":"Siyan Liu, Marisa C Hamilton, Thomas N Cowart, Alejandro Barrera, Lexi R Bounds, Alexander C Nelson, Richard W Doty, Andrew S Allen, Gregory E Crawford, William H Majoros, Charles A. Gersbach","doi":"10.1101/2024.09.04.611293","DOIUrl":"https://doi.org/10.1101/2024.09.04.611293","url":null,"abstract":"Recent technological developments in single-cell RNA-seq CRISPR screens enable high-throughput investigation of the genome. Through transduction of a gRNA library to a cell population followed by transcriptomic profiling by scRNA-seq, it is possible to characterize the effects of thousands of genomic perturbations on global gene expression. A major source of noise in scRNA-seq CRISPR screens are ambient gRNAs, which are contaminating gRNAs that likely originate from other cells. If not properly filtered, ambient gRNAs can result in an excess of false positive gRNA assignments. Here, we utilize CRISPR barnyard assays to characterize ambient gRNA noise in single-cell CRISPR screens. We use these datasets to develop and train CLEANSER, a mixture model that identifies and filters ambient gRNA noise. This model takes advantage of the bimodal distribution between native and ambient gRNAs and includes both gRNA and cell-specific normalization parameters, correcting for confounding technical factors that affect individual gRNAs and cells. The output of CLEANSER is the probability that a gRNA-cell assignment is in the native distribution over the ambient distribution. We find that ambient gRNA filtering methods impact differential gene expression analysis outcomes and that CLEANSER outperforms alternate approaches by increasing gRNA-cell assignment accuracy.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1101/2024.09.03.610986
Adrian Cazares, Wendy Figueroa, Daniel Cazares, Leandro Lima, Jake D. Turnbull, Hannah McGregor, Jo Dicks, Sarah Alexander, Zamin Iqbal, Nicholas Thomson
Plasmids are now the primary vectors of antimicrobial resistance, but our understanding of how human industrialisation of antibiotics influenced this is limited by a paucity of data predating the antibiotic era (PAE). By investigating plasmids from clinically relevant bacteria isolated between 1917 and 1954 and comparing them to modern plasmids, we captured over 100 years of evolution. We show that while all PAE plasmids were devoid of resistance genes and most never acquired them, a small minority evolved to drive the global spread of resistance to first-line and last-resort antibiotics in Gram-negative bacteria. They have evolved through complex microevolution and fusion events into a distinct group of highly recombinogenic, multi-replicon, self-transmissible plasmids that now pose the highest risk to resistance dissemination, and therefore human health.
{"title":"Pre and Post antibiotic epoch: insights into the historical spread of antimicrobial resistance","authors":"Adrian Cazares, Wendy Figueroa, Daniel Cazares, Leandro Lima, Jake D. Turnbull, Hannah McGregor, Jo Dicks, Sarah Alexander, Zamin Iqbal, Nicholas Thomson","doi":"10.1101/2024.09.03.610986","DOIUrl":"https://doi.org/10.1101/2024.09.03.610986","url":null,"abstract":"Plasmids are now the primary vectors of antimicrobial resistance, but our understanding of how human industrialisation of antibiotics influenced this is limited by a paucity of data predating the antibiotic era (PAE). By investigating plasmids from clinically relevant bacteria isolated between 1917 and 1954 and comparing them to modern plasmids, we captured over 100 years of evolution. We show that while all PAE plasmids were devoid of resistance genes and most never acquired them, a small minority evolved to drive the global spread of resistance to first-line and last-resort antibiotics in Gram-negative bacteria. They have evolved through complex microevolution and fusion events into a distinct group of highly recombinogenic, multi-replicon, self-transmissible plasmids that now pose the highest risk to resistance dissemination, and therefore human health.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"282 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}