硅学分析确定了纳米孔生成的 SARS-CoV-2 序列的污染阈值。

IF 3.6 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS PLoS Computational Biology Pub Date : 2024-08-19 eCollection Date: 2024-08-01 DOI:10.1371/journal.pcbi.1011539

Ayooluwa J Bolaji, Ana T Duggan

{"title":"硅学分析确定了纳米孔生成的 SARS-CoV-2 序列的污染阈值。","authors":"Ayooluwa J Bolaji, Ana T Duggan","doi":"10.1371/journal.pcbi.1011539","DOIUrl":null,"url":null,"abstract":"The SARS-CoV-2 pandemic has brought molecular biology and genomic sequencing into the public consciousness and lexicon. With an emphasis on rapid turnaround, genomic data informed both diagnostic and surveillance decisions for the current pandemic at a previously unheard-of scale. The surge in the submission of genomic data to publicly available databases proved essential as comparing different genome sequences offers a wealth of knowledge, including phylogenetic links, modes of transmission, rates of evolution, and the impact of mutations on infection and disease severity. However, the scale of the pandemic has meant that sequencing runs are rarely repeated due to limited sample material and/or the availability of sequencing resources, resulting in the upload of some imperfect runs to public repositories. As a result, it is crucial to investigate the data obtained from these imperfect runs to determine whether the results are reliable prior to depositing them in a public database. Numerous studies have identified a variety of sources of contamination in public next-generation sequencing (NGS) data as the number of NGS studies increases along with the diversity of sequencing technologies and procedures. For this study, we conducted an in silico experiment with known SARS-CoV-2 sequences produced from Oxford Nanopore Technologies sequencing to investigate the effect of contamination on lineage calls and single nucleotide variants (SNVs). A contamination threshold below which runs are expected to generate accurate lineage calls and maintain genome-relatedness and integrity was identified. Together, these findings provide a benchmark below which imperfect runs may be considered robust for reporting results to both stakeholders and public repositories and reduce the need for repeat or wasted runs.","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 8","pages":"e1011539"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398645/pdf/","citationCount":"0","resultStr":"{\"title\":\"In silico analyses identify sequence contamination thresholds for Nanopore-generated SARS-CoV-2 sequences.\",\"authors\":\"Ayooluwa J Bolaji, Ana T Duggan\",\"doi\":\"10.1371/journal.pcbi.1011539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The SARS-CoV-2 pandemic has brought molecular biology and genomic sequencing into the public consciousness and lexicon. With an emphasis on rapid turnaround, genomic data informed both diagnostic and surveillance decisions for the current pandemic at a previously unheard-of scale. The surge in the submission of genomic data to publicly available databases proved essential as comparing different genome sequences offers a wealth of knowledge, including phylogenetic links, modes of transmission, rates of evolution, and the impact of mutations on infection and disease severity. However, the scale of the pandemic has meant that sequencing runs are rarely repeated due to limited sample material and/or the availability of sequencing resources, resulting in the upload of some imperfect runs to public repositories. As a result, it is crucial to investigate the data obtained from these imperfect runs to determine whether the results are reliable prior to depositing them in a public database. Numerous studies have identified a variety of sources of contamination in public next-generation sequencing (NGS) data as the number of NGS studies increases along with the diversity of sequencing technologies and procedures. For this study, we conducted an in silico experiment with known SARS-CoV-2 sequences produced from Oxford Nanopore Technologies sequencing to investigate the effect of contamination on lineage calls and single nucleotide variants (SNVs). A contamination threshold below which runs are expected to generate accurate lineage calls and maintain genome-relatedness and integrity was identified. Together, these findings provide a benchmark below which imperfect runs may be considered robust for reporting results to both stakeholders and public repositories and reduce the need for repeat or wasted runs.\",\"PeriodicalId\":20241,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":\"20 8\",\"pages\":\"e1011539\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398645/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1011539\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011539","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

SARS-CoV-2 大流行使分子生物学和基因组测序进入了公众的视野和词典。由于强调快速反应，基因组数据以前所未闻的规模为当前大流行病的诊断和监测决策提供了依据。事实证明，向公开数据库提交基因组数据的激增是至关重要的，因为比较不同的基因组序列可以获得丰富的知识，包括系统发育联系、传播方式、进化速度以及突变对感染和疾病严重程度的影响。然而，大流行病的规模意味着，由于样本材料和/或测序资源有限，测序运行很少重复，导致一些不完善的运行被上传到公共资料库。因此，在将这些不完善的数据存入公共数据库之前，必须对这些数据进行调查，以确定结果是否可靠。随着 NGS 研究数量的增加以及测序技术和程序的多样化，大量研究发现了公共下一代测序（NGS）数据中的各种污染源。在这项研究中，我们用牛津纳米孔技术公司（Oxford Nanopore Technologies）测序产生的已知 SARS-CoV-2 序列进行了一项硅学实验，以研究污染对品系调用和单核苷酸变异（SNV）的影响。研究确定了污染阈值，低于该阈值的运行有望产生准确的系谱调用，并保持基因组相关性和完整性。这些发现共同提供了一个基准，低于该基准，不完善的运行可被视为稳健的，以便向利益相关者和公共资料库报告结果，并减少重复或浪费运行的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

In silico analyses identify sequence contamination thresholds for Nanopore-generated SARS-CoV-2 sequences.

The SARS-CoV-2 pandemic has brought molecular biology and genomic sequencing into the public consciousness and lexicon. With an emphasis on rapid turnaround, genomic data informed both diagnostic and surveillance decisions for the current pandemic at a previously unheard-of scale. The surge in the submission of genomic data to publicly available databases proved essential as comparing different genome sequences offers a wealth of knowledge, including phylogenetic links, modes of transmission, rates of evolution, and the impact of mutations on infection and disease severity. However, the scale of the pandemic has meant that sequencing runs are rarely repeated due to limited sample material and/or the availability of sequencing resources, resulting in the upload of some imperfect runs to public repositories. As a result, it is crucial to investigate the data obtained from these imperfect runs to determine whether the results are reliable prior to depositing them in a public database. Numerous studies have identified a variety of sources of contamination in public next-generation sequencing (NGS) data as the number of NGS studies increases along with the diversity of sequencing technologies and procedures. For this study, we conducted an in silico experiment with known SARS-CoV-2 sequences produced from Oxford Nanopore Technologies sequencing to investigate the effect of contamination on lineage calls and single nucleotide variants (SNVs). A contamination threshold below which runs are expected to generate accurate lineage calls and maintain genome-relatedness and integrity was identified. Together, these findings provide a benchmark below which imperfect runs may be considered robust for reporting results to both stakeholders and public repositories and reduce the need for repeat or wasted runs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.10

自引率

4.70%

发文量

820

审稿时长

2.5 months

期刊介绍： PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.