Jonas A Gustafson, Sophia B Gibson, Nikhita Damaraju, Miranda P G Zalusky, Kendra Hoekzema, David Twesigomwe, Lei Yang, Anthony A Snead, Phillip A Richmond, Wouter De Coster, Nathan D Olson, Andrea Guarracino, Qiuhui Li, Angela L Miller, Joy Goffena, Zachary B Anderson, Sophie H R Storz, Sydney A Ward, Maisha Sinha, Claudia Gonzaga-Jauregui, Wayne E Clarke, Anna O Basile, André Corvelo, Catherine Reeves, Adrienne Helland, Rajeeva Lochan Musunuri, Mahler Revsine, Karynne E Patterson, Cate R Paschal, Christina Zakarian, Sara Goodwin, Tanner D Jensen, Esther Robb, W Richard McCombie, Fritz J Sedlazeck, Justin M Zook, Stephen B Montgomery, Erik Garrison, Mikhail Kolmogorov, Michael C Schatz, Richard N McLaughlin, Harriet Dashnow, Michael C Zody, Matt Loose, Miten Jain, Evan E Eichler, Danny E Miller
{"title":"High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.","authors":"Jonas A Gustafson, Sophia B Gibson, Nikhita Damaraju, Miranda P G Zalusky, Kendra Hoekzema, David Twesigomwe, Lei Yang, Anthony A Snead, Phillip A Richmond, Wouter De Coster, Nathan D Olson, Andrea Guarracino, Qiuhui Li, Angela L Miller, Joy Goffena, Zachary B Anderson, Sophie H R Storz, Sydney A Ward, Maisha Sinha, Claudia Gonzaga-Jauregui, Wayne E Clarke, Anna O Basile, André Corvelo, Catherine Reeves, Adrienne Helland, Rajeeva Lochan Musunuri, Mahler Revsine, Karynne E Patterson, Cate R Paschal, Christina Zakarian, Sara Goodwin, Tanner D Jensen, Esther Robb, W Richard McCombie, Fritz J Sedlazeck, Justin M Zook, Stephen B Montgomery, Erik Garrison, Mikhail Kolmogorov, Michael C Schatz, Richard N McLaughlin, Harriet Dashnow, Michael C Zody, Matt Loose, Miten Jain, Evan E Eichler, Danny E Miller","doi":"10.1101/gr.279273.124","DOIUrl":null,"url":null,"abstract":"<p><p>Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279273.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.