Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez
A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed Spirodela polyrhiza (Lemnaceae) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the Spirodela genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.
{"title":"Atypical epigenetic and small RNA control of degenerated transposons and their fragments in clonally reproducing Spirodela polyrhiza","authors":"Rodolphe Dombey, Daniel Buendía-Ávila, Verónica Barragán-Borrero, Laura Diezma-Navas, Arturo Ponce-Mañe, José Mario Vargas-Guerrero, Rana Elias, Arturo Marí-Ordóñez","doi":"10.1101/gr.279532.124","DOIUrl":"https://doi.org/10.1101/gr.279532.124","url":null,"abstract":"A handful of model plants have provided insight into silencing of transposable elements (TEs) through RNA-directed DNA methylation (RdDM). Guided by 24 nt long small-interfering RNAs (siRNAs), this epigenetic regulation installs DNA methylation and histone modifications like H3K9me2, which can be subsequently maintained independently of siRNAs. However, the genome of the clonally propagating duckweed <em>Spirodela polyrhiza</em> (<em>Lemnaceae</em>) has low levels of DNA methylation, very low expression of RdDM components, and near absence of 24 nt siRNAs. Moreover, some genes encoding RdDM factors, DNA methylation maintenance, and RNA silencing mechanisms are missing from the genome. Here, we investigated the distribution of TEs and their epigenetic marks in the <em>Spirodela</em> genome. Although abundant degenerated TEs have largely lost DNA methylation and H3K9me2 is low, they remain marked by the heterochromatin-associated H3K9me1 and H3K27me1 modifications. In contrast, we find high levels of DNA methylation and H3K9me2 in the relatively few intact TEs, which are source of 24 nt siRNAs, like RdDM-controlled TEs in other angiosperms. The data suggest that, potentially as adaptation to vegetative propagation, RdDM extent, silencing components, and targets are different from other angiosperms, preferentially focused on potentially intact TEs. It also provides evidence for heterochromatin maintenance independently of DNA methylation in flowering plants. These discoveries highlight the diversity of silencing mechanisms that exist in plants and the importance of using disparate model species to discover these mechanisms.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"822 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze
Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes USH2A and ADGRV1, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of ADGRV1 transcripts as well as full-length 18.9 kb USH2A transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.
{"title":"Deciphering the largest disease-associated transcript isoforms in the human neural retina with advanced long-read sequencing approaches","authors":"Merel Stemerdink, Tabea Riepe, Nick Zomer, Renee Salz, Michael Kwint, Jaap Oostrik, Raoul Timmermans, Barbara Ferrari, Stefano Ferrari, Alfredo Duenas Rey, Emma Delanote, Suzanne E de Bruijn, Hannie Kremer, Susanne Roosing, Frauke Coppieters, Alexander Hoischen, Frans P.M. Cremers, Peter-Bram A.C. 't Hoen, Erwin van Wijk, Erik de Vrieze","doi":"10.1101/gr.280060.124","DOIUrl":"https://doi.org/10.1101/gr.280060.124","url":null,"abstract":"Sequencing technologies have long limited the comprehensive investigation of large transcripts associated with inherited retinal diseases (IRDs) like Usher syndrome, which involves 11 associated genes with transcripts up to 19.6 kb. To address this, we used PacBio long-read mRNA isoform sequencing (Iso-Seq) following standard library preparation and an optimized workflow to enrich for long transcripts in the human neural retina. While our workflow achieved sequencing of transcripts up to 15 kb, this was insufficient for Usher syndrome-associated genes <em>USH2A</em> and <em>ADGRV1</em>, with transcripts of 18.9 kb and 19.6 kb, respectively. To overcome this, we employed the Samplix Xdrop System for indirect target enrichment of cDNA, a technique typically used for genomic DNA capture. This method facilitated the successful capture and sequencing of <em>ADGRV1</em> transcripts as well as full-length 18.9 kb <em>USH2A</em> transcripts. By combining algorithmic analysis with detailed manual curation of sequenced reads, we identified novel isoforms characterized by an alternative 5' transcription start site, the inclusion of previously unannotated exons or alternative splicing events across the 11 Usher syndrome-associated genes. These findings have significant implications for genetic diagnostics and therapeutic development. The analysis applied here on Usher syndrome-associated transcripts exemplifies a valuable approach that can be extended to explore the transcriptomic complexity of other IRD-associated genes in the complete transcriptome dataset generated within this study. Additionally, we demonstrated the adaptability of the Samplix Xdrop system for capturing cDNA, and the optimized methodologies described can be expanded to facilitate the enrichment of large transcripts from various tissues of interest.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel T. Horsfield, Basil C.T. Fok, Yuhan Fu, Paul Turner, John A. Lees, Nicholas J. Croucher
Serotype surveillance of Streptococcus pneumoniae (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify the co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed Nanopore Adaptive Sampling (NAS), an on-sequencer enrichment method that selects for target DNA in real-time, for direct detection of S. pneumoniae in complex samples. Whereas NAS targeting the whole S. pneumoniae genome was ineffective in the presence of nonpathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines S. pneumoniae serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS, and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximize the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine S. pneumoniae surveillance that is fast, accurate and feasible in low-resource settings. Although NAS facilitates whole-genome enrichment under ideal circumstances, GNASTy enables targeted enrichment to optimize serotype surveillance in complex samples.
肺炎链球菌血清型监测对于了解当前疫苗接种策略的有效性至关重要。然而,现有的血清分型方法在识别多种肺炎球菌共同携带和检测新型血清型方面能力有限。为了开发一种可扩展的便携式血清分型方法来克服这些挑战,我们采用了纳米孔自适应采样(NAS)技术,这是一种实时选择目标DNA的序列富集方法,可直接检测复杂样本中的肺炎链球菌。针对肺炎双球菌全基因组的 NAS 在非致病性链球菌存在的情况下无效,而针对决定肺炎双球菌血清型的操作子囊生物合成位点(CBL)的方法既特异又灵敏。与不使用 NAS 的测序方法相比,NAS 大大提高了 CBL 的覆盖率和检出率,并能准确量化共同携带样本中血清型的相对流行率。为了最大限度地提高NAS检测新型血清型的灵敏度,我们开发了一种新的泛基因组图算法,命名为GNASTy,并进行了基准测试。我们的研究表明,当样本中含有目标序列数据库中没有的血清型时,GNASTy 优于目前基于线性基因组比对的 NAS 实现。这项工作中开发的方法为新型血清型的发现和肺炎双球菌的常规监测提供了一种改进的方法,这种方法快速、准确,在资源匮乏的环境中也是可行的。虽然在理想情况下 NAS 可促进全基因组富集,但 GNASTy 可实现有针对性的富集,以优化复杂样本中的血清型监测。
{"title":"Optimizing nanopore adaptive sampling for pneumococcal serotype surveillance in complex samples using the graph-based GNASTy algorithm","authors":"Samuel T. Horsfield, Basil C.T. Fok, Yuhan Fu, Paul Turner, John A. Lees, Nicholas J. Croucher","doi":"10.1101/gr.279435.124","DOIUrl":"https://doi.org/10.1101/gr.279435.124","url":null,"abstract":"Serotype surveillance of <em>Streptococcus pneumoniae</em> (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify the co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed Nanopore Adaptive Sampling (NAS), an on-sequencer enrichment method that selects for target DNA in real-time, for direct detection of <em>S. pneumoniae</em> in complex samples. Whereas NAS targeting the whole <em>S. pneumoniae</em> genome was ineffective in the presence of nonpathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines <em>S. pneumoniae</em> serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS, and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximize the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine <em>S. pneumoniae</em> surveillance that is fast, accurate and feasible in low-resource settings. Although NAS facilitates whole-genome enrichment under ideal circumstances, GNASTy enables targeted enrichment to optimize serotype surveillance in complex samples.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianfu Zeng, Haotian Liao, Lin Xia, Siyao You, Yanqun Huang, Jiaxun Zhang, Yahui Liu, Xuyan Liu, Dan Xie
Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing (HTS) is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We discovered a prominent presence of somatic SVs in adjacent nontumor tissues, which significantly differed from somatic single nucleotide variants (SNVs) and copy number variations (CNVs). The types of SVs were markedly different between adjacent nontumor and tumor tissues, with somatic insertions (INSs) and deletions (DELs) serving as early genomic alterations associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently resulted in the generation of somatic SVs, particularly inducing interchromosomal translocations. While HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are implicated as early driving events in the pathogenesis of HCC. Long-read RNA sequencing revealed that some HBV-induced SVs impact cancer-associated genes, with translocations being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.
{"title":"Multisite long-read sequencing reveals the early contribution of somatic structural variations to HBV-related hepatocellular carcinoma tumorigenesis","authors":"Tianfu Zeng, Haotian Liao, Lin Xia, Siyao You, Yanqun Huang, Jiaxun Zhang, Yahui Liu, Xuyan Liu, Dan Xie","doi":"10.1101/gr.279617.124","DOIUrl":"https://doi.org/10.1101/gr.279617.124","url":null,"abstract":"Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing (HTS) is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We discovered a prominent presence of somatic SVs in adjacent nontumor tissues, which significantly differed from somatic single nucleotide variants (SNVs) and copy number variations (CNVs). The types of SVs were markedly different between adjacent nontumor and tumor tissues, with somatic insertions (INSs) and deletions (DELs) serving as early genomic alterations associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently resulted in the generation of somatic SVs, particularly inducing interchromosomal translocations. While HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are implicated as early driving events in the pathogenesis of HCC. Long-read RNA sequencing revealed that some HBV-induced SVs impact cancer-associated genes, with translocations being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"10 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Repetitive regions in eukaryotic genomes often contain important functional or regulatory elements. Despite significant algorithmic and technological advancements in genome sequencing and assembly over the past three decades, modern de novo assemblers still struggle to accurately reconstruct highly repetitive regions. In this work, we introduce RAmbler (Repeat Assembler), a reference-guided assembler specialized for the assembly of complex repetitive regions exclusively from PacBio HiFi reads. RAmbler (i) identifies repetitive regions by detecting unusually high coverage regions after mapping HiFi reads to the draft genome assembly, (ii) finds single-copy k-mers from the HiFi reads, (i.e., k-mers that are expected to occur only once in the genome), (iii) uses the relative location of single-copy k-mers to barcode each HiFi read, (iv) clusters HiFi reads based on their shared bar-codes, (v) generates contigs by assembling the reads in each cluster, and (vi) generates a consensus assembly from the overlap graph of the assembled contigs. Here we show that RAmbler can reconstruct human centromeres and other complex repeats to a quality comparable to the manually-curated telomere-to-telomere human genome assembly. Across over 250 synthetic datasets, RAmbler outperforms hifiasm, LJA, HiCANU, and Verkko across various parameters such as repeat lengths, number of repeats, heterozygosity rates and depth of sequencing.
{"title":"RAmbler resolves complex repeats in human Chromosomes 8, 19, and X","authors":"Sakshar Chakravarty, Glennis Logsdon, Stefano Lonardi","doi":"10.1101/gr.279308.124","DOIUrl":"https://doi.org/10.1101/gr.279308.124","url":null,"abstract":"Repetitive regions in eukaryotic genomes often contain important functional or regulatory elements. Despite significant algorithmic and technological advancements in genome sequencing and assembly over the past three decades, modern de novo assemblers still struggle to accurately reconstruct highly repetitive regions. In this work, we introduce RAmbler (Repeat Assembler), a reference-guided assembler specialized for the assembly of complex repetitive regions exclusively from PacBio HiFi reads. RAmbler (i) identifies repetitive regions by detecting unusually high coverage regions after mapping HiFi reads to the draft genome assembly, (ii) finds single-copy <em>k</em>-mers from the HiFi reads, (i.e., <em>k</em>-mers that are expected to occur only once in the genome), (iii) uses the relative location of single-copy <em>k</em>-mers to barcode each HiFi read, (iv) clusters HiFi reads based on their shared bar-codes, (v) generates contigs by assembling the reads in each cluster, and (vi) generates a consensus assembly from the overlap graph of the assembled contigs. Here we show that RAmbler can reconstruct human centromeres and other complex repeats to a quality comparable to the manually-curated telomere-to-telomere human genome assembly. Across over 250 synthetic datasets, RAmbler outperforms hifiasm, LJA, HiCANU, and Verkko across various parameters such as repeat lengths, number of repeats, heterozygosity rates and depth of sequencing.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"22 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143546124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hosiana Abewe, Alexandra Richey, Jeffery M. Vahrenkamp, Matthew Ginley-Hidinger, Craig M. Rush, Noel Kitchen, Xiaoyang Zhang, Jason Gertz
Transcriptional enhancers can regulate individual or multiple genes through long-range three-dimensional (3D) genome interactions, and these interactions are commonly altered in cancer. Yet, the functional relationship between changes in 3D genome interactions associated with regulatory regions and differential gene expression appears context-dependent. In this study, we used HiChIP to capture changes in 3D genome interactions between active regulatory regions of endometrial cancer cells in response to estrogen treatment and uncovered significant differential long-range interactions strongly enriched for estrogen receptor alpha (ER, also known as ESR1)–bound sites (ERBSs). The ERBSs anchoring differential chromatin loops with either a gene's promoter or distal regions were correlated with larger transcriptional responses to estrogen compared with ERBSs not involved in differential 3D genome interactions. To functionally test this observation, CRISPR-based Enhancer-i was used to deactivate specific ERBSs, which revealed a wide range of effects on the transcriptional response to estrogen. However, these effects are only subtly and not significantly stronger for ERBSs in differential chromatin loops. In addition, we observed an enrichment of 3D genome interactions between the promoters of estrogen-upregulated genes and found that looped promoters can work together cooperatively. Overall, our work reveals that estrogen treatment causes large changes in 3D genome structure in endometrial cancer cells; however, these changes are not required for a regulatory region to contribute to an estrogen transcriptional response.
{"title":"Estrogen-induced chromatin looping changes identify a subset of functional regulatory elements","authors":"Hosiana Abewe, Alexandra Richey, Jeffery M. Vahrenkamp, Matthew Ginley-Hidinger, Craig M. Rush, Noel Kitchen, Xiaoyang Zhang, Jason Gertz","doi":"10.1101/gr.279699.124","DOIUrl":"https://doi.org/10.1101/gr.279699.124","url":null,"abstract":"Transcriptional enhancers can regulate individual or multiple genes through long-range three-dimensional (3D) genome interactions, and these interactions are commonly altered in cancer. Yet, the functional relationship between changes in 3D genome interactions associated with regulatory regions and differential gene expression appears context-dependent. In this study, we used HiChIP to capture changes in 3D genome interactions between active regulatory regions of endometrial cancer cells in response to estrogen treatment and uncovered significant differential long-range interactions strongly enriched for estrogen receptor alpha (ER, also known as ESR1)–bound sites (ERBSs). The ERBSs anchoring differential chromatin loops with either a gene's promoter or distal regions were correlated with larger transcriptional responses to estrogen compared with ERBSs not involved in differential 3D genome interactions. To functionally test this observation, CRISPR-based Enhancer-i was used to deactivate specific ERBSs, which revealed a wide range of effects on the transcriptional response to estrogen. However, these effects are only subtly and not significantly stronger for ERBSs in differential chromatin loops. In addition, we observed an enrichment of 3D genome interactions between the promoters of estrogen-upregulated genes and found that looped promoters can work together cooperatively. Overall, our work reveals that estrogen treatment causes large changes in 3D genome structure in endometrial cancer cells; however, these changes are not required for a regulatory region to contribute to an estrogen transcriptional response.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"14 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Netanya Keil, Carolina Monzó, Lauren McIntyre, Ana Conesa
SQANTI-reads leverages SQANTI3, a tool for the analysis of the quality of transcript models, to develop a read-level quality control framework for replicated long-read RNA-seq experiments. The number and distribution of reads, as well as the number and distribution of unique junction chains (transcript splicing patterns), in SQANTI3 structural categories are informative of raw data quality. Multisample visualizations of QC metrics are presented by experimental design factors to identify outliers. We introduce new metrics for 1) the identification of potentially under-annotated genes and putative novel transcripts and for 2) quantifying variation in junction donors and acceptors. We applied SQANTI-reads to two different datasets, a Drosophila developmental experiment and a multiplatform dataset from the LRGASP project and demonstrate that the tool effectively reveals the impact of read coverage on data quality, and readily identifies strong and weak splicing sites.
{"title":"Quality assessment of long read data in multisample lrRNA-seq experiments with SQANTI-reads","authors":"Netanya Keil, Carolina Monzó, Lauren McIntyre, Ana Conesa","doi":"10.1101/gr.280021.124","DOIUrl":"https://doi.org/10.1101/gr.280021.124","url":null,"abstract":"SQANTI-reads leverages SQANTI3, a tool for the analysis of the quality of transcript models, to develop a read-level quality control framework for replicated long-read RNA-seq experiments. The number and distribution of reads, as well as the number and distribution of unique junction chains (transcript splicing patterns), in SQANTI3 structural categories are informative of raw data quality. Multisample visualizations of QC metrics are presented by experimental design factors to identify outliers. We introduce new metrics for 1) the identification of potentially under-annotated genes and putative novel transcripts and for 2) quantifying variation in junction donors and acceptors. We applied SQANTI-reads to two different datasets, a <em>Drosophila</em> developmental experiment and a multiplatform dataset from the LRGASP project and demonstrate that the tool effectively reveals the impact of read coverage on data quality, and readily identifies strong and weak splicing sites.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"18 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long-read sequencing (LRS) technologies have revolutionized transcriptomic research by enabling the comprehensive sequencing of full-length transcripts. Using these technologies, researchers have reported tens of thousands of novel transcripts, even in well-annotated genomes, while developing new algorithms and experimental approaches to handle the noisy data. The LRGASP community effort benchmarked LRS methods in transcriptomics and validated many novel, lowly-expressed, sample-specific transcripts identified by long reads. These molecules represent deviations of the major transcriptional program, that were easily overlooked by short-read sequencing methods but are now captured by the full-length, single-molecule approach. This Perspective discusses the challenges and opportunities associated with LRS' capacity to unravel this fraction of the transcriptome, both in terms of transcriptome biology and genome annotation. For transcriptome biology, we need to develop novel experimental and computational methods to effectively differentiate technology errors from rare but real molecules. For genome annotation, we must agree on the strategy to capture molecular variability while still defining reference annotations that are useful for genome research.
{"title":"Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation","authors":"Carolina Monzó, Adam Frankish, Ana Conesa","doi":"10.1101/gr.279865.124","DOIUrl":"https://doi.org/10.1101/gr.279865.124","url":null,"abstract":"Long-read sequencing (LRS) technologies have revolutionized transcriptomic research by enabling the comprehensive sequencing of full-length transcripts. Using these technologies, researchers have reported tens of thousands of novel transcripts, even in well-annotated genomes, while developing new algorithms and experimental approaches to handle the noisy data. The LRGASP community effort benchmarked LRS methods in transcriptomics and validated many novel, lowly-expressed, sample-specific transcripts identified by long reads. These molecules represent deviations of the major transcriptional program, that were easily overlooked by short-read sequencing methods but are now captured by the full-length, single-molecule approach. This Perspective discusses the challenges and opportunities associated with LRS' capacity to unravel this fraction of the transcriptome, both in terms of transcriptome biology and genome annotation. For transcriptome biology, we need to develop novel experimental and computational methods to effectively differentiate technology errors from rare but real molecules. For genome annotation, we must agree on the strategy to capture molecular variability while still defining reference annotations that are useful for genome research.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"23 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143538298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haloom Rafehi, Liam G. Fearnley, Justin Read, Penny Snell, Kayli C. Davies, Liam Scott, Greta Gillies, Genevieve C. Thompson, Tess A. Field, Aleena Eldo, Simon Bodek, Ernest Butler, Luke Chen, John Drago, Himanshu Goel, Anna Hackett, G. Michael Halmagyi, Andrew Hannaford, Katya Kotschet, Kishore R. Kumar, Smitha Kumble, Matthew Lee-Archer, Abhishek Malhotra, Mark Paine, Michael Poon, Kate Pope, Katrina Reardon, Steven Ring, Anne Ronan, Matthew Silsby, Renee Smyth, Chloe Stutterd, Mathew Wallis, John Waterston, Thomas Wellings, Kirsty West, Christine Wools, Kathy H.C. Wu, David J. Szmulewicz, Martin B. Delatycki, Melanie Bahlo, Paul J. Lockhart
The cerebellar ataxias (CAs) are a heterogeneous group of disorders characterized by progressive incoordination. Seventeen repeat expansion (RE) loci have been identified as the primary genetic cause and account for >80% of genetic diagnoses. Despite this, diagnostic testing is limited and inefficient, often utilizing single gene assays. This study evaluates the effectiveness of long- and short-read sequencing as diagnostic tools for CA. We recruited 110 individuals (48 females, 62 males) with a clinical diagnosis of CA. Short-read genome sequencing (SR-GS) was performed to identify pathogenic RE and also non-RE variants in 356 genes associated with CA. Independently, long-read sequencing with adaptive sampling (LR-AS) was performed to identify pathogenic RE. SR-GS provided a genetic diagnosis for 38% of the cohort (40/110) including seven non-RE pathogenic variants. RE causes disease in 33 individuals, with the most common condition being SCA27B (n = 24). In comparison, LR-AS identified pathogenic RE in 29 individuals. RE identification for the two methods was concordant apart from four SCA27B cases not detected by LR-AS due to low read depth. For both technologies manual review of the RE alignment enhances diagnostic outcomes. Orthogonal testing for SCA27B revealed a 15% and 0% false positive rate for SR-GS and LR-AS, respectively. In conclusion, both technologies are powerful screening tools for CA. SR-GS is a mature technology currently used by diagnostic providers, requiring only minor changes in bioinformatic workflows to enable CA diagnostics. LR-AS offers considerable advantages in the context of RE detection and characterization but requires optimization before clinical implementation.
{"title":"A prospective trial comparing programmable targeted long-read sequencing and short-read genome sequencing for genetic diagnosis of cerebellar ataxia","authors":"Haloom Rafehi, Liam G. Fearnley, Justin Read, Penny Snell, Kayli C. Davies, Liam Scott, Greta Gillies, Genevieve C. Thompson, Tess A. Field, Aleena Eldo, Simon Bodek, Ernest Butler, Luke Chen, John Drago, Himanshu Goel, Anna Hackett, G. Michael Halmagyi, Andrew Hannaford, Katya Kotschet, Kishore R. Kumar, Smitha Kumble, Matthew Lee-Archer, Abhishek Malhotra, Mark Paine, Michael Poon, Kate Pope, Katrina Reardon, Steven Ring, Anne Ronan, Matthew Silsby, Renee Smyth, Chloe Stutterd, Mathew Wallis, John Waterston, Thomas Wellings, Kirsty West, Christine Wools, Kathy H.C. Wu, David J. Szmulewicz, Martin B. Delatycki, Melanie Bahlo, Paul J. Lockhart","doi":"10.1101/gr.279634.124","DOIUrl":"https://doi.org/10.1101/gr.279634.124","url":null,"abstract":"The cerebellar ataxias (CAs) are a heterogeneous group of disorders characterized by progressive incoordination. Seventeen repeat expansion (RE) loci have been identified as the primary genetic cause and account for >80% of genetic diagnoses. Despite this, diagnostic testing is limited and inefficient, often utilizing single gene assays. This study evaluates the effectiveness of long- and short-read sequencing as diagnostic tools for CA. We recruited 110 individuals (48 females, 62 males) with a clinical diagnosis of CA. Short-read genome sequencing (SR-GS) was performed to identify pathogenic RE and also non-RE variants in 356 genes associated with CA. Independently, long-read sequencing with adaptive sampling (LR-AS) was performed to identify pathogenic RE. SR-GS provided a genetic diagnosis for 38% of the cohort (40/110) including seven non-RE pathogenic variants. RE causes disease in 33 individuals, with the most common condition being SCA27B (<em>n</em> = 24). In comparison, LR-AS identified pathogenic RE in 29 individuals. RE identification for the two methods was concordant apart from four SCA27B cases not detected by LR-AS due to low read depth. For both technologies manual review of the RE alignment enhances diagnostic outcomes. Orthogonal testing for SCA27B revealed a 15% and 0% false positive rate for SR-GS and LR-AS, respectively. In conclusion, both technologies are powerful screening tools for CA. SR-GS is a mature technology currently used by diagnostic providers, requiring only minor changes in bioinformatic workflows to enable CA diagnostics. LR-AS offers considerable advantages in the context of RE detection and characterization but requires optimization before clinical implementation.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"55 4 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143518775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Shen, Siyuan Cheng, Deepak Purushotham, Xiaoyu Zhuo, Alan Y Du, Wenjin Zhang, Daofeng Li, Ting Wang
Repetitive elements, mostly derived from transposable elements (TEs), account for half the DNA in human and other mammalian genomes. Although epigenetic mechanisms, including DNA methylation and repressive histone modifications, have evolved to suppress TE activities, TEs have substantially shaped the regulatory landscape of the host genome by contributing regulatory sequences to it. TE-derived sequences are often highly repetitive and thus have low mappability, making it difficult to profile the genomics of TEs using short-read sequencing technology. Many specialized bioinformatics tools have been developed for TE-related analysis, but meaningfully visualizing, navigating, and interpreting such data remains challenging. Here, we describe the WashU Repeat Browser to host genomics profiles of human and mouse TEs using data produced by the ENCODE Project and to support the navigation, interactive visualization, integration, comparison, and analysis in the context of TEs. WashU Repeat Browser is a web-based platform allowing users to browse genomic and statistical signals over repetitive elements derived from ENCODE, Roadmap, and FANTOM datasets. The Browser provides a TE-centric view including TE subfamily enrichments, TE subfamily profiling, as well as overviews of genomic signals on individual TE loci where we extend the WashU Epigenome Browser to display user-selected datasets and TE loci. These features could help to close the gaps in our understanding of the repetitive sequences and their putative regulatory functions and aid investigators in formulating new hypotheses by integrating their data with public data.
{"title":"Exploring the epigenome profiles of repetitive elements with the WashU Repeat Browser","authors":"Jiawei Shen, Siyuan Cheng, Deepak Purushotham, Xiaoyu Zhuo, Alan Y Du, Wenjin Zhang, Daofeng Li, Ting Wang","doi":"10.1101/gr.279764.124","DOIUrl":"https://doi.org/10.1101/gr.279764.124","url":null,"abstract":"Repetitive elements, mostly derived from transposable elements (TEs), account for half the DNA in human and other mammalian genomes. Although epigenetic mechanisms, including DNA methylation and repressive histone modifications, have evolved to suppress TE activities, TEs have substantially shaped the regulatory landscape of the host genome by contributing regulatory sequences to it. TE-derived sequences are often highly repetitive and thus have low mappability, making it difficult to profile the genomics of TEs using short-read sequencing technology. Many specialized bioinformatics tools have been developed for TE-related analysis, but meaningfully visualizing, navigating, and interpreting such data remains challenging. Here, we describe the WashU Repeat Browser to host genomics profiles of human and mouse TEs using data produced by the ENCODE Project and to support the navigation, interactive visualization, integration, comparison, and analysis in the context of TEs. WashU Repeat Browser is a web-based platform allowing users to browse genomic and statistical signals over repetitive elements derived from ENCODE, Roadmap, and FANTOM datasets. The Browser provides a TE-centric view including TE subfamily enrichments, TE subfamily profiling, as well as overviews of genomic signals on individual TE loci where we extend the WashU Epigenome Browser to display user-selected datasets and TE loci. These features could help to close the gaps in our understanding of the repetitive sequences and their putative regulatory functions and aid investigators in formulating new hypotheses by integrating their data with public data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"15 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143462864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}