Ilya B. Slizovskiy, Nathalie Bonin, Jonathan E. Bravo, Peter M. Ferm, Jacob Singer, Christina Boucher, Noelle R. Noyes
We investigated the efficiency of target-enriched long-read sequencing (TELSeq) for detecting antimicrobial resistance genes (ARGs) and mobile genetic elements (MGEs) within complex matrices. We aimed to overcome limitations associated with traditional antimicrobial resistance (AMR) detection methods, including short-read shotgun metagenomics, which can lack sensitivity, specificity, and the ability to provide detailed genomic context. By combining biotinylated probe-based enrichment with long-read sequencing, we facilitated the amplification and sequencing of ARGs, eliminating the need for bioinformatic reconstruction. Our experimental design included replicates of human fecal microbiota transplant material, bovine feces, pristine prairie soil, and a mock human gut microbial community, allowing us to examine variables including genomic DNA input and probe set composition. Our findings demonstrated that TELSeq markedly improves the detection rates of ARGs and MGEs compared to traditional sequencing methods, underlining its potential for accurate AMR monitoring. A key insight from our research is the importance of incorporating mobilome profiles to better predict the transferability of ARGs within microbial communities, prompting a recommendation for the use of combined ARG–MGE probe sets for future studies. We also reveal limitations for ARG detection from low-input workflows, and describe the next steps for ongoing protocol refinement to minimize technical variability and expand utility in clinical and public health settings. This effort is part of our broader commitment to advancing methodologies that address the global challenge of AMR.
我们研究了目标富集长读数测序(TELSeq)在复杂基质中检测抗菌素耐药性基因(ARGs)和移动遗传因子(MGEs)的效率。我们的目标是克服传统抗菌药耐药性(AMR)检测方法的局限性,包括缺乏灵敏度、特异性和提供详细基因组背景信息能力的短读数猎枪元基因组学。通过将基于生物素化探针的富集与长线程测序相结合,我们促进了 ARGs 的扩增和测序,从而消除了生物信息重建的需要。我们的实验设计包括人类粪便微生物群移植材料、牛粪便、原始草原土壤和模拟人类肠道微生物群落的重复实验,使我们能够研究包括基因组 DNA 输入和探针集组成在内的变量。我们的研究结果表明,与传统测序方法相比,TELSeq 显著提高了 ARGs 和 MGEs 的检出率,凸显了其在准确监测 AMR 方面的潜力。我们的研究得出的一个重要结论是,必须结合移动组图谱来更好地预测 ARGs 在微生物群落中的可转移性,因此建议在未来的研究中使用 ARG-MGE 组合探针集。我们还揭示了低投入工作流程在检测 ARG 方面的局限性,并介绍了下一步如何不断完善方案,以最大限度地减少技术变异,扩大在临床和公共卫生环境中的应用。这项工作是我们更广泛承诺的一部分,我们致力于推进各种方法,以应对 AMR 这一全球性挑战。
{"title":"Factors impacting target-enriched long-read sequencing of resistomes and mobilomes","authors":"Ilya B. Slizovskiy, Nathalie Bonin, Jonathan E. Bravo, Peter M. Ferm, Jacob Singer, Christina Boucher, Noelle R. Noyes","doi":"10.1101/gr.279226.124","DOIUrl":"https://doi.org/10.1101/gr.279226.124","url":null,"abstract":"We investigated the efficiency of target-enriched long-read sequencing (TELSeq) for detecting antimicrobial resistance genes (ARGs) and mobile genetic elements (MGEs) within complex matrices. We aimed to overcome limitations associated with traditional antimicrobial resistance (AMR) detection methods, including short-read shotgun metagenomics, which can lack sensitivity, specificity, and the ability to provide detailed genomic context. By combining biotinylated probe-based enrichment with long-read sequencing, we facilitated the amplification and sequencing of ARGs, eliminating the need for bioinformatic reconstruction. Our experimental design included replicates of human fecal microbiota transplant material, bovine feces, pristine prairie soil, and a mock human gut microbial community, allowing us to examine variables including genomic DNA input and probe set composition. Our findings demonstrated that TELSeq markedly improves the detection rates of ARGs and MGEs compared to traditional sequencing methods, underlining its potential for accurate AMR monitoring. A key insight from our research is the importance of incorporating mobilome profiles to better predict the transferability of ARGs within microbial communities, prompting a recommendation for the use of combined ARG–MGE probe sets for future studies. We also reveal limitations for ARG detection from low-input workflows, and describe the next steps for ongoing protocol refinement to minimize technical variability and expand utility in clinical and public health settings. This effort is part of our broader commitment to advancing methodologies that address the global challenge of AMR.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"8 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander J Ritter, Jolene M Draper, Chris Vollmers, Jeremy R Sanford
Alternative splicing (AS) alters the cis-regulatory landscape of mRNA isoforms, leading to transcripts with distinct localization, stability, and translational efficiency. To rigorously investigate mRNA isoform-specific ribosome association, we generated subcellular fractionation and sequencing (Frac-seq) libraries using both conventional short reads and long reads from human embryonic stem cells (ESCs) and neural progenitor cells (NPCs) derived from the same ESCs. We performed de novo transcriptome assembly from high-confidence long reads from cytosolic, monosomal, light, and heavy polyribosomal fractions and quantified their abundance using short reads from their respective subcellular fractions. Thousands of transcripts in each cell type exhibited association with particular subcellular fractions relative to the cytosol. Of the multi-isoform genes, 27% and 19% exhibited significant differential isoform sedimentation in ESCs and NPCs, respectively. Alternative promoter usage and internal exon skipping accounted for the majority of differences between isoforms from the same gene. Random forest classifiers implicated coding sequence (CDS) and untranslated region (UTR) lengths as important determinants of isoform-specific sedimentation profiles, and motif analyses reveal potential cell type-specific and subcellular fraction-associated RNA-binding protein signatures. Taken together, our data demonstrate that alternative mRNA processing within the CDS and UTRs impacts the translational control of mRNA isoforms during stem cell differentiation, and highlight the utility of using a novel long-read sequencing-based method to study translational control.
{"title":"Long-read subcellular fractionation and sequencing reveals the translational fate of full-length mRNA isoforms during neuronal differentiation.","authors":"Alexander J Ritter, Jolene M Draper, Chris Vollmers, Jeremy R Sanford","doi":"10.1101/gr.279170.124","DOIUrl":"10.1101/gr.279170.124","url":null,"abstract":"<p><p>Alternative splicing (AS) alters the <i>cis</i>-regulatory landscape of mRNA isoforms, leading to transcripts with distinct localization, stability, and translational efficiency. To rigorously investigate mRNA isoform-specific ribosome association, we generated subcellular fractionation and sequencing (Frac-seq) libraries using both conventional short reads and long reads from human embryonic stem cells (ESCs) and neural progenitor cells (NPCs) derived from the same ESCs. We performed de novo transcriptome assembly from high-confidence long reads from cytosolic, monosomal, light, and heavy polyribosomal fractions and quantified their abundance using short reads from their respective subcellular fractions. Thousands of transcripts in each cell type exhibited association with particular subcellular fractions relative to the cytosol. Of the multi-isoform genes, 27% and 19% exhibited significant differential isoform sedimentation in ESCs and NPCs, respectively. Alternative promoter usage and internal exon skipping accounted for the majority of differences between isoforms from the same gene. Random forest classifiers implicated coding sequence (CDS) and untranslated region (UTR) lengths as important determinants of isoform-specific sedimentation profiles, and motif analyses reveal potential cell type-specific and subcellular fraction-associated RNA-binding protein signatures. Taken together, our data demonstrate that alternative mRNA processing within the CDS and UTRs impacts the translational control of mRNA isoforms during stem cell differentiation, and highlight the utility of using a novel long-read sequencing-based method to study translational control.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141261622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristine Bilgrav Saether, Jesper Eisfeldt, Jesse D. Bengtsson, Ming Yin Lun, Christopher M. Grochowski, Medhat Mahmoud, Hsiao-Tuan Chao, Jill A. Rosenfeld, Pengfei Liu, Marlene Ek, Jakob Schuy, Adam Ameur, Hongzheng Dai, Undiagnosed Diseases Network, James Paul Hwang, Fritz J. Sedlazeck, Weimin Bi, Ronit Marom, Josephine Wincent, Ann Nordgren, Claudia M.B. Carvalho, Anna Lindstrand
Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in cis. Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) (n = 9) or srGS (n = 3) and resolve nine of them. In four cases, the inversion breakpoint region was missing from at least one of the human reference genomes (GRCh37, GRCh38, T2T-CHM13) and a reference agnostic analysis was needed. One of these cases, an INV9 mappable only in de novo assembled lrGS data using T2T-CHM13 disrupts EHMT1 consistent with a Mendelian diagnosis (Kleefstra syndrome 1; MIM#610253). Next, by pairwise comparison between T2T-CHM13, GRCh37, and GRCh38, as well as the chimpanzee and bonobo, we show that hundreds of megabases of sequence are missing from at least one human reference, highlighting that primate genomes contribute to genomic diversity. Aligning population genomic data to these regions indicated that these regions are variable between individuals. Our analysis emphasizes that T2T-CHM13 is necessary to maximize the value of lrGS for optimal inversion detection in clinical diagnostics. These results highlight the importance of leveraging diverse and comprehensive reference genomes to resolve unsolved molecular cases in rare diseases.
{"title":"Leveraging the T2T assembly to resolve rare and pathogenic inversions in reference genome gaps","authors":"Kristine Bilgrav Saether, Jesper Eisfeldt, Jesse D. Bengtsson, Ming Yin Lun, Christopher M. Grochowski, Medhat Mahmoud, Hsiao-Tuan Chao, Jill A. Rosenfeld, Pengfei Liu, Marlene Ek, Jakob Schuy, Adam Ameur, Hongzheng Dai, Undiagnosed Diseases Network, James Paul Hwang, Fritz J. Sedlazeck, Weimin Bi, Ronit Marom, Josephine Wincent, Ann Nordgren, Claudia M.B. Carvalho, Anna Lindstrand","doi":"10.1101/gr.279346.124","DOIUrl":"https://doi.org/10.1101/gr.279346.124","url":null,"abstract":"Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in <em>cis</em>. Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) (<em>n</em> = 9) or srGS (<em>n</em> = 3) and resolve nine of them. In four cases, the inversion breakpoint region was missing from at least one of the human reference genomes (GRCh37, GRCh38, T2T-CHM13) and a reference agnostic analysis was needed. One of these cases, an INV9 mappable only in de novo assembled lrGS data using T2T-CHM13 disrupts <em>EHMT1</em> consistent with a Mendelian diagnosis (Kleefstra syndrome 1; MIM#610253). Next, by pairwise comparison between T2T-CHM13, GRCh37, and GRCh38, as well as the chimpanzee and bonobo, we show that hundreds of megabases of sequence are missing from at least one human reference, highlighting that primate genomes contribute to genomic diversity. Aligning population genomic data to these regions indicated that these regions are variable between individuals. Our analysis emphasizes that T2T-CHM13 is necessary to maximize the value of lrGS for optimal inversion detection in clinical diagnostics. These results highlight the importance of leveraging diverse and comprehensive reference genomes to resolve unsolved molecular cases in rare diseases.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas A Gustafson, Sophia B Gibson, Nikhita Damaraju, Miranda P G Zalusky, Kendra Hoekzema, David Twesigomwe, Lei Yang, Anthony A Snead, Phillip A Richmond, Wouter De Coster, Nathan D Olson, Andrea Guarracino, Qiuhui Li, Angela L Miller, Joy Goffena, Zachary B Anderson, Sophie H R Storz, Sydney A Ward, Maisha Sinha, Claudia Gonzaga-Jauregui, Wayne E Clarke, Anna O Basile, André Corvelo, Catherine Reeves, Adrienne Helland, Rajeeva Lochan Musunuri, Mahler Revsine, Karynne E Patterson, Cate R Paschal, Christina Zakarian, Sara Goodwin, Tanner D Jensen, Esther Robb, W Richard McCombie, Fritz J Sedlazeck, Justin M Zook, Stephen B Montgomery, Erik Garrison, Mikhail Kolmogorov, Michael C Schatz, Richard N McLaughlin, Harriet Dashnow, Michael C Zody, Matt Loose, Miten Jain, Evan E Eichler, Danny E Miller
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
{"title":"High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.","authors":"Jonas A Gustafson, Sophia B Gibson, Nikhita Damaraju, Miranda P G Zalusky, Kendra Hoekzema, David Twesigomwe, Lei Yang, Anthony A Snead, Phillip A Richmond, Wouter De Coster, Nathan D Olson, Andrea Guarracino, Qiuhui Li, Angela L Miller, Joy Goffena, Zachary B Anderson, Sophie H R Storz, Sydney A Ward, Maisha Sinha, Claudia Gonzaga-Jauregui, Wayne E Clarke, Anna O Basile, André Corvelo, Catherine Reeves, Adrienne Helland, Rajeeva Lochan Musunuri, Mahler Revsine, Karynne E Patterson, Cate R Paschal, Christina Zakarian, Sara Goodwin, Tanner D Jensen, Esther Robb, W Richard McCombie, Fritz J Sedlazeck, Justin M Zook, Stephen B Montgomery, Erik Garrison, Mikhail Kolmogorov, Michael C Schatz, Richard N McLaughlin, Harriet Dashnow, Michael C Zody, Matt Loose, Miten Jain, Evan E Eichler, Danny E Miller","doi":"10.1101/gr.279273.124","DOIUrl":"10.1101/gr.279273.124","url":null,"abstract":"<p><p>Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142365031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wouter De Coster, Ida Höijer, Inge Bruggeman, Svenn D'Hert, Malin Melin, Adam Ameur, Rosa Rademakers
The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of The 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats has been genotyped using STRdust and LongTR to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this data set and the feature to upload one's data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with myotonic dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition and, as such, enable a better assessment of rare tandem repeat alleles observed in patients.
{"title":"Visualization and analysis of medically relevant tandem repeats in nanopore sequencing of control cohorts with pathSTR.","authors":"Wouter De Coster, Ida Höijer, Inge Bruggeman, Svenn D'Hert, Malin Melin, Adam Ameur, Rosa Rademakers","doi":"10.1101/gr.279265.124","DOIUrl":"10.1101/gr.279265.124","url":null,"abstract":"<p><p>The lack of population-scale databases hampers research and diagnostics for medically relevant tandem repeats and repeat expansions. We attempt to fill this gap using our pathSTR web tool, which leverages long-read sequencing of large cohorts to determine repeat length and sequence composition in a healthy population. The current version includes 1040 individuals of The 1000 Genomes Project cohort sequenced on the Oxford Nanopore Technologies PromethION. A comprehensive set of medically relevant tandem repeats has been genotyped using STRdust and LongTR to determine the tandem repeat length and sequence composition. PathSTR provides rich visualizations of this data set and the feature to upload one's data for comparison along the control cohort. We demonstrate the implementation of this application using data from targeted nanopore sequencing of a patient with myotonic dystrophy type 1. This resource will empower the genetics community to get a more complete overview of normal variation in tandem repeat length and sequence composition and, as such, enable a better assessment of rare tandem repeat alleles observed in patients.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141987779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lovro Vrček, Xavier Bresson, Thomas Laurent, Martin Schmitz, Kenji Kawaguchi, Mile Šikić
The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging only the symmetries inherent to the problem, GNNome reconstructs assemblies from PacBio HiFi reads with contiguity and quality comparable to those of the state-of-the-art tools across several species. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different ploidy and aneuploidy degrees. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.
{"title":"Geometric deep learning framework for de novo genome assembly","authors":"Lovro Vrček, Xavier Bresson, Thomas Laurent, Martin Schmitz, Kenji Kawaguchi, Mile Šikić","doi":"10.1101/gr.279307.124","DOIUrl":"https://doi.org/10.1101/gr.279307.124","url":null,"abstract":"The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging only the symmetries inherent to the problem, GNNome reconstructs assemblies from PacBio HiFi reads with contiguity and quality comparable to those of the state-of-the-art tools across several species. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different ploidy and aneuploidy degrees. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"34 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Nolan, Timothy E Reznicek, Christopher T Cummings, M Jordan Rowley
The neuronal nucleus houses a meticulously organized genome. Within this structure, genetic material is not simply compacted but arranged into a precise and functional 3D chromatin landscape essential for cellular regulation. This mini-review highlights the importance of this chromatin landscape in healthy neurodevelopment, as well as the diseases that occur with aberrant chromatin architecture. We discuss insights into the fundamental mechanistic relationship between histone modifications, DNA methylation, and genome organization. We then discuss findings that reveal how these epigenetic features change throughout normal neurodevelopment. Finally, we highlight single-gene neurodevelopmental disorders that illustrate the interdependence of epigenetic features, showing how disruptions in DNA methylation or genome architecture can ripple across the entire epigenome. As such, we emphasize the importance of measuring multiple chromatin architectural aspects, as the disruption of one mechanism can likely impact others in the intricate epigenetic network. This mini-review underscores the vast gaps in our understanding of chromatin structure in neurodevelopmental diseases and the substantial research needed to understand the interplay between chromatin features and neurodevelopment.
神经细胞核内有一个组织严密的基因组。在这个结构中,遗传物质并不是简单地压缩,而是排列成一个精确的功能性三维染色质景观,这对细胞调控至关重要。这篇微型综述强调了染色质结构在健康神经发育中的重要性,以及染色质结构异常导致的疾病。我们将讨论组蛋白修饰、DNA 甲基化和基因组组织之间的基本机制关系。然后,我们将讨论揭示这些表观遗传特征如何在正常神经发育过程中发生变化的研究结果。最后,我们重点介绍了单基因神经发育障碍,这些障碍说明了表观遗传特征之间的相互依存关系,显示了 DNA 甲基化或基因组结构的破坏是如何波及整个表观遗传组的。因此,我们强调测量多种染色质结构方面的重要性,因为一种机制的破坏很可能会影响错综复杂的表观遗传网络中的其他机制。这篇微型综述强调了我们对神经发育性疾病中染色质结构的理解存在巨大差距,要了解染色质特征与神经发育之间的相互作用还需要进行大量研究。
{"title":"The chromatin tapestry as a framework for neurodevelopment.","authors":"Ben Nolan, Timothy E Reznicek, Christopher T Cummings, M Jordan Rowley","doi":"10.1101/gr.278408.123","DOIUrl":"10.1101/gr.278408.123","url":null,"abstract":"<p><p>The neuronal nucleus houses a meticulously organized genome. Within this structure, genetic material is not simply compacted but arranged into a precise and functional 3D chromatin landscape essential for cellular regulation. This mini-review highlights the importance of this chromatin landscape in healthy neurodevelopment, as well as the diseases that occur with aberrant chromatin architecture. We discuss insights into the fundamental mechanistic relationship between histone modifications, DNA methylation, and genome organization. We then discuss findings that reveal how these epigenetic features change throughout normal neurodevelopment. Finally, we highlight single-gene neurodevelopmental disorders that illustrate the interdependence of epigenetic features, showing how disruptions in DNA methylation or genome architecture can ripple across the entire epigenome. As such, we emphasize the importance of measuring multiple chromatin architectural aspects, as the disruption of one mechanism can likely impact others in the intricate epigenetic network. This mini-review underscores the vast gaps in our understanding of chromatin structure in neurodevelopmental diseases and the substantial research needed to understand the interplay between chromatin features and neurodevelopment.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"34 10","pages":"1477-1486"},"PeriodicalIF":6.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529992/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesper Eisfeldt, Edward J. Higginbotham, Felix Lenner, Jennifer Howe, Bridget A. Fernandez, Anna Lindstrand, Stephen W. Scherer, Lars Feuk
Rare or de novo structural variation, primarily in the form of copy number variants, is detected in 5%–10% of autism spectrum disorder (ASD) families. While complex structural variants involving duplications can generally be detected using microarray or short-read genome sequencing (GS), these methods frequently fail to characterize breakpoints at nucleotide resolution, requiring additional molecular methods for validation and fine-mapping. Here, we use Oxford Nanopore Technologies PromethION long-read GS to characterize complex genomic rearrangements (CGRs) involving large duplications that segregate with ASD in five families. In total, we investigated 13 CGR carriers and were able to resolve all breakpoint junctions at nucleotide resolution. While all breakpoints were identified, the precise genomic architecture of one rearrangement remained unresolved with three different potential structures. The findings in two families include potential fusion genes formed through duplication rearrangements, involving IL1RAPL1–DMD and SUPT16H–CHD8. In two of the families originating from the same geographical region, an identical rearrangement involving ANK2 was identified, which likely represents a founder variant. In addition, we analyze methylation status directly from the long-read data, allowing us to assess the activity of rearranged genes and regulatory regions. Investigation of methylation across the CGRs reveals aberrant methylation status in carriers across a rearrangement affecting the CREBBP locus. In aggregate, our results demonstrate the utility of nanopore sequencing to pinpoint CGRs associated with ASD in five unrelated families, and highlight the importance of a gene-centric description of disease-associated complex chromosomal rearrangements.
{"title":"Resolving complex duplication variants in autism spectrum disorder using long-read genome sequencing","authors":"Jesper Eisfeldt, Edward J. Higginbotham, Felix Lenner, Jennifer Howe, Bridget A. Fernandez, Anna Lindstrand, Stephen W. Scherer, Lars Feuk","doi":"10.1101/gr.279263.124","DOIUrl":"https://doi.org/10.1101/gr.279263.124","url":null,"abstract":"Rare or de novo structural variation, primarily in the form of copy number variants, is detected in 5%–10% of autism spectrum disorder (ASD) families. While complex structural variants involving duplications can generally be detected using microarray or short-read genome sequencing (GS), these methods frequently fail to characterize breakpoints at nucleotide resolution, requiring additional molecular methods for validation and fine-mapping. Here, we use Oxford Nanopore Technologies PromethION long-read GS to characterize complex genomic rearrangements (CGRs) involving large duplications that segregate with ASD in five families. In total, we investigated 13 CGR carriers and were able to resolve all breakpoint junctions at nucleotide resolution. While all breakpoints were identified, the precise genomic architecture of one rearrangement remained unresolved with three different potential structures. The findings in two families include potential fusion genes formed through duplication rearrangements, involving <em>IL1RAPL1–DMD</em> and <em>SUPT16H–CHD8</em>. In two of the families originating from the same geographical region, an identical rearrangement involving <em>ANK2</em> was identified, which likely represents a founder variant. In addition, we analyze methylation status directly from the long-read data, allowing us to assess the activity of rearranged genes and regulatory regions. Investigation of methylation across the CGRs reveals aberrant methylation status in carriers across a rearrangement affecting the <em>CREBBP</em> locus. In aggregate, our results demonstrate the utility of nanopore sequencing to pinpoint CGRs associated with ASD in five unrelated families, and highlight the importance of a gene-centric description of disease-associated complex chromosomal rearrangements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"86 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesper Eisfeldt, Adam Ameur, Felix Lenner, Esmee Ten Berk de Boer, Marlene Ek, Josephine Wincent, Raquel Vaz, Jesper Ottosson, Tord Jonson, Sofie Ivarsson, Sofia Thunström, Alexandra Topa, Simon Stenberg, Anna Rohlin, Anna Sandestig, Margareta Nordling, Pia Palmebäck, Magnus Burstedt, Frida Nordin, Eva-Lena Stattin, Maria Sobol, Panagiotis Baliakas, Marie-Louise Bondeson, Ida Höijer, Kristine Bilgrav Saether, Lovisa Lovmar, Hans Ehrencrona, Malin Melin, Lars Feuk, Anna Lindstrand
Clinical genetic laboratories often require a comprehensive analysis of chromosomal rearrangements/structural variants (SVs), from large events like translocations and inversions to supernumerary ring/marker chromosomes and small deletions or duplications. Understanding the complexity of these events and their clinical consequences requires pinpointing breakpoint junctions and resolving the derivative chromosome structure. This task often surpasses the capabilities of short-read sequencing technologies. In contrast, long-read sequencing techniques present a compelling alternative for clinical diagnostics. Here, Genomic Medicine Sweden—Rare Diseases has explored the utility of HiFi Revio long-read genome sequencing (lrGS) for digital karyotyping of SVs nationwide. The 16 samples from 13 families were collected from all Swedish healthcare regions. Prior investigations had identified 16 SVs, ranging from simple to complex rearrangements, including inversions, translocations, and copy number variants. We have established a national pipeline and a shared variant database for variant calling and filtering. Using lrGS, 14 of the 16 known SVs are detected. Of these, 13 are mapped at nucleotide resolution, and one complex rearrangement is only visible by read depth. Two Chromosome 21 rearrangements, one mosaic, remain undetected. Average read lengths are 8.3–18.8 kb with coverage exceeding 20× for all samples. De novo assembly results in a limited number of phased contigs per individual (N50 6–86 Mb), enabling direct characterization of the chromosomal rearrangements. In a national pilot study, we demonstrate the utility of HiFi Revio lrGS for analyzing chromosomal rearrangements. Based on our results, we propose a 5-year plan to expand lrGS use for rare disease diagnostics in Sweden.
{"title":"A national long-read sequencing study on chromosomal rearrangements uncovers hidden complexities","authors":"Jesper Eisfeldt, Adam Ameur, Felix Lenner, Esmee Ten Berk de Boer, Marlene Ek, Josephine Wincent, Raquel Vaz, Jesper Ottosson, Tord Jonson, Sofie Ivarsson, Sofia Thunström, Alexandra Topa, Simon Stenberg, Anna Rohlin, Anna Sandestig, Margareta Nordling, Pia Palmebäck, Magnus Burstedt, Frida Nordin, Eva-Lena Stattin, Maria Sobol, Panagiotis Baliakas, Marie-Louise Bondeson, Ida Höijer, Kristine Bilgrav Saether, Lovisa Lovmar, Hans Ehrencrona, Malin Melin, Lars Feuk, Anna Lindstrand","doi":"10.1101/gr.279510.124","DOIUrl":"https://doi.org/10.1101/gr.279510.124","url":null,"abstract":"Clinical genetic laboratories often require a comprehensive analysis of chromosomal rearrangements/structural variants (SVs), from large events like translocations and inversions to supernumerary ring/marker chromosomes and small deletions or duplications. Understanding the complexity of these events and their clinical consequences requires pinpointing breakpoint junctions and resolving the derivative chromosome structure. This task often surpasses the capabilities of short-read sequencing technologies. In contrast, long-read sequencing techniques present a compelling alternative for clinical diagnostics. Here, Genomic Medicine Sweden—Rare Diseases has explored the utility of HiFi Revio long-read genome sequencing (lrGS) for digital karyotyping of SVs nationwide. The 16 samples from 13 families were collected from all Swedish healthcare regions. Prior investigations had identified 16 SVs, ranging from simple to complex rearrangements, including inversions, translocations, and copy number variants. We have established a national pipeline and a shared variant database for variant calling and filtering. Using lrGS, 14 of the 16 known SVs are detected. Of these, 13 are mapped at nucleotide resolution, and one complex rearrangement is only visible by read depth. Two Chromosome 21 rearrangements, one mosaic, remain undetected. Average read lengths are 8.3–18.8 kb with coverage exceeding 20× for all samples. De novo assembly results in a limited number of phased contigs per individual (N50 6–86 Mb), enabling direct characterization of the chromosomal rearrangements. In a national pilot study, we demonstrate the utility of HiFi Revio lrGS for analyzing chromosomal rearrangements. Based on our results, we propose a 5-year plan to expand lrGS use for rare disease diagnostics in Sweden.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"105 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long-read sequencing technology enables highly accurate detection of allele-specific RNA expression, providing insights into the effects of genetic variation on splicing and RNA abundance. Furthermore, the ability to directly sequence RNA using the Oxford Nanopore technology promises the detection of RNA modifications in tandem with ascertaining the allelic origin of each molecule. Here, we leverage these advantages to determine allele-biased patterns of N6-methyladenosine (m6A) modifications in native mRNA. We utilized human and mouse cells with known genetic variants to assign allelic origin of each mRNA molecule combined with a supervised machine learning model to detect read-level m6A modification ratios. Our analyses revealed the importance of sequences adjacent to the DRACH-motif in determining m6A deposition, in addition to allelic differences that directly alter the motif. Moreover, we discovered allele-specific m6A modification (ASM) events with no genetic variants in close proximity to the differentially modified nucleotide, demonstrating the unique advantage of using long reads and surpassing the capabilities of antibody-based short-read approaches. This technological advancement promises to advance our understanding of the role of genetics in determining mRNA modifications.
{"title":"Long-read RNA sequencing reveals allele-specific N6-methyladenosine modifications","authors":"Dayea Park, Can Cenik","doi":"10.1101/gr.279270.124","DOIUrl":"https://doi.org/10.1101/gr.279270.124","url":null,"abstract":"Long-read sequencing technology enables highly accurate detection of allele-specific RNA expression, providing insights into the effects of genetic variation on splicing and RNA abundance. Furthermore, the ability to directly sequence RNA using the Oxford Nanopore technology promises the detection of RNA modifications in tandem with ascertaining the allelic origin of each molecule. Here, we leverage these advantages to determine allele-biased patterns of N6-methyladenosine (m6A) modifications in native mRNA. We utilized human and mouse cells with known genetic variants to assign allelic origin of each mRNA molecule combined with a supervised machine learning model to detect read-level m6A modification ratios. Our analyses revealed the importance of sequences adjacent to the DRACH-motif in determining m6A deposition, in addition to allelic differences that directly alter the motif. Moreover, we discovered allele-specific m6A modification (ASM) events with no genetic variants in close proximity to the differentially modified nucleotide, demonstrating the unique advantage of using long reads and surpassing the capabilities of antibody-based short-read approaches. This technological advancement promises to advance our understanding of the role of genetics in determining mRNA modifications.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"5 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}