Meng Wang, Yumei Li, Jun Wang, Soo Hwan Oh, Yexuan Cao, Rui Chen
The vast majority of protein-coding genes in the human genome produce multiple mRNA isoforms through alternative splicing, significantly enhancing the complexity of the transcriptome and proteome. To establish an efficient method for characterizing transcript isoforms within tissue samples, we conducted a systematic comparison between single-cell long-read and conventional short-read RNA sequencing techniques. The transcriptome of approximately 30,000 mouse retina cells was profiled using 1.54 billion Illumina short reads and 1.40 billion Oxford Nanopore Technologies long reads. Consequently, we identify 44,325 transcript isoforms, with a notable 38% previously uncharacterized and 17% expressed exclusively in distinct cellular subclasses. We observe that long-read sequencing not only matches the gene expression and cell-type annotation performance of short-read sequencing but also excel in the precise identification of transcript isoforms. While transcript isoforms are often shared across various cell types, their relative abundance shows considerable cell type–specific variation. The data generated from our study significantly enhance the existing repertoire of transcript isoforms, thereby establishing a resource for future research into the mechanisms and implications of alternative splicing within retinal biology and its links to related diseases.
{"title":"Integrating short-read and long-read single-cell RNA sequencing for comprehensive transcriptome profiling in mouse retina","authors":"Meng Wang, Yumei Li, Jun Wang, Soo Hwan Oh, Yexuan Cao, Rui Chen","doi":"10.1101/gr.279167.124","DOIUrl":"https://doi.org/10.1101/gr.279167.124","url":null,"abstract":"The vast majority of protein-coding genes in the human genome produce multiple mRNA isoforms through alternative splicing, significantly enhancing the complexity of the transcriptome and proteome. To establish an efficient method for characterizing transcript isoforms within tissue samples, we conducted a systematic comparison between single-cell long-read and conventional short-read RNA sequencing techniques. The transcriptome of approximately 30,000 mouse retina cells was profiled using 1.54 billion Illumina short reads and 1.40 billion Oxford Nanopore Technologies long reads. Consequently, we identify 44,325 transcript isoforms, with a notable 38% previously uncharacterized and 17% expressed exclusively in distinct cellular subclasses. We observe that long-read sequencing not only matches the gene expression and cell-type annotation performance of short-read sequencing but also excel in the precise identification of transcript isoforms. While transcript isoforms are often shared across various cell types, their relative abundance shows considerable cell type–specific variation. The data generated from our study significantly enhance the existing repertoire of transcript isoforms, thereby establishing a resource for future research into the mechanisms and implications of alternative splicing within retinal biology and its links to related diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"489 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyue Duan, Chaolei Chen, Chang Du, Liang Guo, Jun Liu, Naipeng Hou, Pan Li, Xiaolan Qi, Fei Gao, Xuguang Du, Jiangping Song, Sen Wu
Although CRISPR-Cas based genome editing has made significant strides over the past decade, achieving simultaneous homozygous gene editing of multiple targets in primary cells remains a significant challenge. In this study, we optimized a coselection strategy to enhance homozygous gene editing rates in the genomes of primary porcine fetal fibroblasts (PFFs). The strategy utilizes the expression of a surrogate reporter (eGFP) to select for cells with the highest reporter expression, thereby improving editing efficiency. When applied to simultaneous multigene editing, we targeted the most challenging site for selection, while other target sites did not require selection. Using this approach, we successfully obtained single-cell PFF clones (3/10) with seven or more homozygously edited genes, including GGTA1, CMAH, B4GALNT2, CD46, CD47, THBD, and GHR. Importantly, cells edited using this strategy were efficiently used for somatic cell nuclear transfer (SCNT) to generate healthy xenotransplantation pigs in less than five months, a process that previously required years of breeding or multiple rounds of SCNT.
{"title":"Homozygous editing of multiple genes for accelerated generation of xenotransplantation pigs","authors":"Xiaoyue Duan, Chaolei Chen, Chang Du, Liang Guo, Jun Liu, Naipeng Hou, Pan Li, Xiaolan Qi, Fei Gao, Xuguang Du, Jiangping Song, Sen Wu","doi":"10.1101/gr.279709.124","DOIUrl":"https://doi.org/10.1101/gr.279709.124","url":null,"abstract":"Although CRISPR-Cas based genome editing has made significant strides over the past decade, achieving simultaneous homozygous gene editing of multiple targets in primary cells remains a significant challenge. In this study, we optimized a coselection strategy to enhance homozygous gene editing rates in the genomes of primary porcine fetal fibroblasts (PFFs). The strategy utilizes the expression of a surrogate reporter (eGFP) to select for cells with the highest reporter expression, thereby improving editing efficiency. When applied to simultaneous multigene editing, we targeted the most challenging site for selection, while other target sites did not require selection. Using this approach, we successfully obtained single-cell PFF clones (3/10) with seven or more homozygously edited genes, including <em>GGTA1</em>, <em>CMAH</em>, <em>B4GALNT2</em>, <em>CD46</em>, <em>CD47</em>, <em>THBD</em>, and <em>GHR</em>. Importantly, cells edited using this strategy were efficiently used for somatic cell nuclear transfer (SCNT) to generate healthy xenotransplantation pigs in less than five months, a process that previously required years of breeding or multiple rounds of SCNT.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang Du, Oliver Deusch, Ilja Bezrukov, Christa Lanz, Yann Guiguen, Margarete Hoffmann, Anette Habring, Detlef Weigel, Manfred Schartl, Christine Dreyer
The guppy Y Chromosome has been a paradigmatic model for studying the genetics of sex-linked traits and Y Chromosome–driven evolution for more than a century. Despite strong efforts, knowledge on genomic organization and molecular differentiation of the sex chromosome pair remains unsatisfactory and partly contradictory with respect to regions of reduced recombination. Especially the border between pseudoautosomal and male-specific regions of the Y has not been defined so far. To circumvent the problems in assigning the repeat-rich differentiated hemizygous or heterozygous sequences of the sex chromosome pair, we sequenced a YY male generated by a cross of a sex-reversed Maculatus strain XY female to a normal XY male from the inbred Guanapo population. High-molecular-weight genomic DNA from the YY male was sequenced on the Pacific Biosciences platform, and both Y haplotypes were reconstructed by Trio binning. By mapping of male specific SNPs and RADseq sequences, we identify a single male specific-region of ∼5 Mb length at the distal end of the Y (MSY). Sequence divergence between X and Y in the segment is on average five times higher than in the proximal part in agreement with reduced recombination. The MSY is enriched for repeats and transposons but does not differ in the content of coding genes from the X, indicating that genic degeneration has not progressed to a measurable degree.
{"title":"Identification of the male-specific region on the guppy Y Chromosome from a haplotype-resolved assembly","authors":"Kang Du, Oliver Deusch, Ilja Bezrukov, Christa Lanz, Yann Guiguen, Margarete Hoffmann, Anette Habring, Detlef Weigel, Manfred Schartl, Christine Dreyer","doi":"10.1101/gr.279582.124","DOIUrl":"https://doi.org/10.1101/gr.279582.124","url":null,"abstract":"The guppy Y Chromosome has been a paradigmatic model for studying the genetics of sex-linked traits and Y Chromosome–driven evolution for more than a century. Despite strong efforts, knowledge on genomic organization and molecular differentiation of the sex chromosome pair remains unsatisfactory and partly contradictory with respect to regions of reduced recombination. Especially the border between pseudoautosomal and male-specific regions of the Y has not been defined so far. To circumvent the problems in assigning the repeat-rich differentiated hemizygous or heterozygous sequences of the sex chromosome pair, we sequenced a YY male generated by a cross of a sex-reversed Maculatus strain XY female to a normal XY male from the inbred Guanapo population. High-molecular-weight genomic DNA from the YY male was sequenced on the Pacific Biosciences platform, and both Y haplotypes were reconstructed by Trio binning. By mapping of male specific SNPs and RADseq sequences, we identify a single male specific-region of ∼5 Mb length at the distal end of the Y (MSY). Sequence divergence between X and Y in the segment is on average five times higher than in the proximal part in agreement with reduced recombination. The MSY is enriched for repeats and transposons but does not differ in the content of coding genes from the X, indicating that genic degeneration has not progressed to a measurable degree.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interactions between mitochondrial and nuclear factors are essential to life. Nevertheless, the importance of coordinated regulation of mitochondrial–nuclear gene expression (CMNGE) to changing physiological conditions is poorly understood and is limited to certain tissues and organisms. We hypothesized that CMNGE is important for development across vertebrates and, hence, should be conserved. As a first step, we analyzed more than 1400 RNA-seq experiments performed during prenatal development, in neonates, and in adults across vertebrate evolution. We find conserved sharp elevation of CMNGE after birth, including oxidative phosphorylation (OXPHOS) and mitochondrial ribosome genes, in the heart, hindbrain, forebrain, and kidney across mammals, as well as in Gallus gallus and in the lizard Anolis carolinensis. This is accompanied by elevated expression of TCA cycle enzymes and reduction in hypoxia response genes, suggesting a conserved cross-tissue metabolic switch after birth/hatching. Analysis of about 70 known regulators of mitochondrial gene expression reveals consistently elevated expression of PPARGC1A (PGC1 alpha) and CEBPB after birth/hatching across organisms and tissues, thus highlighting them as candidate regulators of CMNGE upon transition to the neonate. Analyses of Danio rerio, Xenopus tropicalis, Caenorhabditis elegans, and Drosophila melanogaster reveal elevated CMNGE prior to hatching in X. tropicalis and in D. melanogaster, which is associated with the emergence of muscle activity. Lack of such an ancient pattern in mammals and in chickens suggests that it was lost during radiation of terrestrial vertebrates. Taken together, our results suggest that regulated CMNGE after birth reflects an essential metabolic switch that is under strong selective constraints.
{"title":"Vertebrates show coordinated elevated expression of mitochondrial and nuclear genes after birth","authors":"Hadar Medini, Dan Mishmar","doi":"10.1101/gr.279700.124","DOIUrl":"https://doi.org/10.1101/gr.279700.124","url":null,"abstract":"Interactions between mitochondrial and nuclear factors are essential to life. Nevertheless, the importance of coordinated regulation of mitochondrial–nuclear gene expression (CMNGE) to changing physiological conditions is poorly understood and is limited to certain tissues and organisms. We hypothesized that CMNGE is important for development across vertebrates and, hence, should be conserved. As a first step, we analyzed more than 1400 RNA-seq experiments performed during prenatal development, in neonates, and in adults across vertebrate evolution. We find conserved sharp elevation of CMNGE after birth, including oxidative phosphorylation (OXPHOS) and mitochondrial ribosome genes, in the heart, hindbrain, forebrain, and kidney across mammals, as well as in <em>Gallus gallus</em> and in the lizard <em>Anolis carolinensis</em>. This is accompanied by elevated expression of TCA cycle enzymes and reduction in hypoxia response genes, suggesting a conserved cross-tissue metabolic switch after birth/hatching. Analysis of about 70 known regulators of mitochondrial gene expression reveals consistently elevated expression of <em>PPARGC1A</em> (PGC1 alpha) and <em>CEBPB</em> after birth/hatching across organisms and tissues, thus highlighting them as candidate regulators of CMNGE upon transition to the neonate. Analyses of <em>Danio rerio</em>, <em>Xenopus tropicalis, Caenorhabditis elegans</em>, and <em>Drosophila melanogaster</em> reveal elevated CMNGE prior to hatching in <em>X. tropicalis</em> and in <em>D. melanogaster</em>, which is associated with the emergence of muscle activity. Lack of such an ancient pattern in mammals and in chickens suggests that it was lost during radiation of terrestrial vertebrates. Taken together, our results suggest that regulated CMNGE after birth reflects an essential metabolic switch that is under strong selective constraints.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez
A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed Spirodela polyrhiza (Lemnaceae) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the Spirodela genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.
{"title":"Atypical epigenetic and small RNA control of degenerated transposons and their fragments in clonally reproducing Spirodela polyrhiza","authors":"Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez","doi":"10.1101/gr.279532.124","DOIUrl":"https://doi.org/10.1101/gr.279532.124","url":null,"abstract":"A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed <em>Spirodela polyrhiza</em> (<em>Lemnaceae</em>) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the <em>Spirodela</em> genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"822 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze
Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes USH2A and ADGRV1, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of ADGRV1 transcripts as well as full-length 18.9 kb USH2A transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.
{"title":"Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches","authors":"Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze","doi":"10.1101/gr.280060.124","DOIUrl":"https://doi.org/10.1101/gr.280060.124","url":null,"abstract":"Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes <em>USH2A</em> and <em>ADGRV1</em>, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of <em>ADGRV1</em> transcripts as well as full-length 18.9 kb <em>USH2A</em> transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel T. Horsfield, Basil C.T. Fok, Yuhan Fu, Paul Turner, John A. Lees, Nicholas J. Croucher
Serotype surveillance of Streptococcus pneumoniae (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify the co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed Nanopore Adaptive Sampling (NAS), an on-sequencer enrichment method that selects for target DNA in real-time, for direct detection of S. pneumoniae in complex samples. Whereas NAS targeting the whole S. pneumoniae genome was ineffective in the presence of nonpathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines S. pneumoniae serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS, and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximize the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine S. pneumoniae surveillance that is fast, accurate and feasible in low-resource settings. Although NAS facilitates whole-genome enrichment under ideal circumstances, GNASTy enables targeted enrichment to optimize serotype surveillance in complex samples.
肺炎链球菌血清型监测对于了解当前疫苗接种策略的有效性至关重要。然而,现有的血清分型方法在识别多种肺炎球菌共同携带和检测新型血清型方面能力有限。为了开发一种可扩展的便携式血清分型方法来克服这些挑战,我们采用了纳米孔自适应采样(NAS)技术,这是一种实时选择目标DNA的序列富集方法,可直接检测复杂样本中的肺炎链球菌。针对肺炎双球菌全基因组的 NAS 在非致病性链球菌存在的情况下无效,而针对决定肺炎双球菌血清型的操作子囊生物合成位点(CBL)的方法既特异又灵敏。与不使用 NAS 的测序方法相比,NAS 大大提高了 CBL 的覆盖率和检出率,并能准确量化共同携带样本中血清型的相对流行率。为了最大限度地提高NAS检测新型血清型的灵敏度,我们开发了一种新的泛基因组图算法,命名为GNASTy,并进行了基准测试。我们的研究表明,当样本中含有目标序列数据库中没有的血清型时,GNASTy 优于目前基于线性基因组比对的 NAS 实现。这项工作中开发的方法为新型血清型的发现和肺炎双球菌的常规监测提供了一种改进的方法,这种方法快速、准确,在资源匮乏的环境中也是可行的。虽然在理想情况下 NAS 可促进全基因组富集,但 GNASTy 可实现有针对性的富集,以优化复杂样本中的血清型监测。
{"title":"Optimizing nanopore adaptive sampling for pneumococcal serotype surveillance in complex samples using the graph-based GNASTy algorithm","authors":"Samuel T. Horsfield, Basil C.T. Fok, Yuhan Fu, Paul Turner, John A. Lees, Nicholas J. Croucher","doi":"10.1101/gr.279435.124","DOIUrl":"https://doi.org/10.1101/gr.279435.124","url":null,"abstract":"Serotype surveillance of <em>Streptococcus pneumoniae</em> (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify the co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed Nanopore Adaptive Sampling (NAS), an on-sequencer enrichment method that selects for target DNA in real-time, for direct detection of <em>S. pneumoniae</em> in complex samples. Whereas NAS targeting the whole <em>S. pneumoniae</em> genome was ineffective in the presence of nonpathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines <em>S. pneumoniae</em> serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS, and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximize the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine <em>S. pneumoniae</em> surveillance that is fast, accurate and feasible in low-resource settings. Although NAS facilitates whole-genome enrichment under ideal circumstances, GNASTy enables targeted enrichment to optimize serotype surveillance in complex samples.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianfu Zeng, Haotian Liao, Lin Xia, Siyao You, Yanqun Huang, Jiaxun Zhang, Yahui Liu, Xuyan Liu, Dan Xie
Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing (HTS) is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We discovered a prominent presence of somatic SVs in adjacent nontumor tissues, which significantly differed from somatic single nucleotide variants (SNVs) and copy number variations (CNVs). The types of SVs were markedly different between adjacent nontumor and tumor tissues, with somatic insertions (INSs) and deletions (DELs) serving as early genomic alterations associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently resulted in the generation of somatic SVs, particularly inducing interchromosomal translocations. While HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are implicated as early driving events in the pathogenesis of HCC. Long-read RNA sequencing revealed that some HBV-induced SVs impact cancer-associated genes, with translocations being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.
{"title":"Multisite long-read sequencing reveals the early contribution of somatic structural variations to HBV-related hepatocellular carcinoma tumorigenesis","authors":"Tianfu Zeng, Haotian Liao, Lin Xia, Siyao You, Yanqun Huang, Jiaxun Zhang, Yahui Liu, Xuyan Liu, Dan Xie","doi":"10.1101/gr.279617.124","DOIUrl":"https://doi.org/10.1101/gr.279617.124","url":null,"abstract":"Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing (HTS) is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We discovered a prominent presence of somatic SVs in adjacent nontumor tissues, which significantly differed from somatic single nucleotide variants (SNVs) and copy number variations (CNVs). The types of SVs were markedly different between adjacent nontumor and tumor tissues, with somatic insertions (INSs) and deletions (DELs) serving as early genomic alterations associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently resulted in the generation of somatic SVs, particularly inducing interchromosomal translocations. While HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are implicated as early driving events in the pathogenesis of HCC. Long-read RNA sequencing revealed that some HBV-induced SVs impact cancer-associated genes, with translocations being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"10 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Repetitive regions in eukaryotic genomes often contain important functional or regulatory elements. Despite significant algorithmic and technological advancements in genome sequencing and assembly over the past three decades, modern de novo assemblers still struggle to accurately reconstruct highly repetitive regions. In this work, we introduce RAmbler (Repeat Assembler), a reference-guided assembler specialized for the assembly of complex repetitive regions exclusively from PacBio HiFi reads. RAmbler (i) identifies repetitive regions by detecting unusually high coverage regions after mapping HiFi reads to the draft genome assembly, (ii) finds single-copy k-mers from the HiFi reads, (i.e., k-mers that are expected to occur only once in the genome), (iii) uses the relative location of single-copy k-mers to barcode each HiFi read, (iv) clusters HiFi reads based on their shared bar-codes, (v) generates contigs by assembling the reads in each cluster, and (vi) generates a consensus assembly from the overlap graph of the assembled contigs. Here we show that RAmbler can reconstruct human centromeres and other complex repeats to a quality comparable to the manually-curated telomere-to-telomere human genome assembly. Across over 250 synthetic datasets, RAmbler outperforms hifiasm, LJA, HiCANU, and Verkko across various parameters such as repeat lengths, number of repeats, heterozygosity rates and depth of sequencing.
{"title":"RAmbler resolves complex repeats in human Chromosomes 8, 19, and X","authors":"Sakshar Chakravarty, Glennis Logsdon, Stefano Lonardi","doi":"10.1101/gr.279308.124","DOIUrl":"https://doi.org/10.1101/gr.279308.124","url":null,"abstract":"Repetitive regions in eukaryotic genomes often contain important functional or regulatory elements. Despite significant algorithmic and technological advancements in genome sequencing and assembly over the past three decades, modern de novo assemblers still struggle to accurately reconstruct highly repetitive regions. In this work, we introduce RAmbler (Repeat Assembler), a reference-guided assembler specialized for the assembly of complex repetitive regions exclusively from PacBio HiFi reads. RAmbler (i) identifies repetitive regions by detecting unusually high coverage regions after mapping HiFi reads to the draft genome assembly, (ii) finds single-copy <em>k</em>-mers from the HiFi reads, (i.e., <em>k</em>-mers that are expected to occur only once in the genome), (iii) uses the relative location of single-copy <em>k</em>-mers to barcode each HiFi read, (iv) clusters HiFi reads based on their shared bar-codes, (v) generates contigs by assembling the reads in each cluster, and (vi) generates a consensus assembly from the overlap graph of the assembled contigs. Here we show that RAmbler can reconstruct human centromeres and other complex repeats to a quality comparable to the manually-curated telomere-to-telomere human genome assembly. Across over 250 synthetic datasets, RAmbler outperforms hifiasm, LJA, HiCANU, and Verkko across various parameters such as repeat lengths, number of repeats, heterozygosity rates and depth of sequencing.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"22 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hosiana Abewe, Alexandra Richey, Jeffery M. Vahrenkamp, Matthew Ginley-Hidinger, Craig M. Rush, Noel Kitchen, Xiaoyang Zhang, Jason Gertz
Transcriptional enhancers can regulate individual or multiple genes through long-range three-dimensional (3D) genome interactions, and these interactions are commonly altered in cancer. Yet, the functional relationship between changes in 3D genome interactions associated with regulatory regions and differential gene expression appears context-dependent. In this study, we used HiChIP to capture changes in 3D genome interactions between active regulatory regions of endometrial cancer cells in response to estrogen treatment and uncovered significant differential long-range interactions strongly enriched for estrogen receptor alpha (ER, also known as ESR1)–bound sites (ERBSs). The ERBSs anchoring differential chromatin loops with either a gene's promoter or distal regions were correlated with larger transcriptional responses to estrogen compared with ERBSs not involved in differential 3D genome interactions. To functionally test this observation, CRISPR-based Enhancer-i was used to deactivate specific ERBSs, which revealed a wide range of effects on the transcriptional response to estrogen. However, these effects are only subtly and not significantly stronger for ERBSs in differential chromatin loops. In addition, we observed an enrichment of 3D genome interactions between the promoters of estrogen-upregulated genes and found that looped promoters can work together cooperatively. Overall, our work reveals that estrogen treatment causes large changes in 3D genome structure in endometrial cancer cells; however, these changes are not required for a regulatory region to contribute to an estrogen transcriptional response.
{"title":"Estrogen-induced chromatin looping changes identify a subset of functional regulatory elements","authors":"Hosiana Abewe, Alexandra Richey, Jeffery M. Vahrenkamp, Matthew Ginley-Hidinger, Craig M. Rush, Noel Kitchen, Xiaoyang Zhang, Jason Gertz","doi":"10.1101/gr.279699.124","DOIUrl":"https://doi.org/10.1101/gr.279699.124","url":null,"abstract":"Transcriptional enhancers can regulate individual or multiple genes through long-range three-dimensional (3D) genome interactions, and these interactions are commonly altered in cancer. Yet, the functional relationship between changes in 3D genome interactions associated with regulatory regions and differential gene expression appears context-dependent. In this study, we used HiChIP to capture changes in 3D genome interactions between active regulatory regions of endometrial cancer cells in response to estrogen treatment and uncovered significant differential long-range interactions strongly enriched for estrogen receptor alpha (ER, also known as ESR1)–bound sites (ERBSs). The ERBSs anchoring differential chromatin loops with either a gene's promoter or distal regions were correlated with larger transcriptional responses to estrogen compared with ERBSs not involved in differential 3D genome interactions. To functionally test this observation, CRISPR-based Enhancer-i was used to deactivate specific ERBSs, which revealed a wide range of effects on the transcriptional response to estrogen. However, these effects are only subtly and not significantly stronger for ERBSs in differential chromatin loops. In addition, we observed an enrichment of 3D genome interactions between the promoters of estrogen-upregulated genes and found that looped promoters can work together cooperatively. Overall, our work reveals that estrogen treatment causes large changes in 3D genome structure in endometrial cancer cells; however, these changes are not required for a regulatory region to contribute to an estrogen transcriptional response.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"14 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}