HQAlign:使用电流级建模对SV检测的纳米孔读数进行对齐

arXiv - CS - Other Computer Science Pub Date : 2023-01-10 DOI:arxiv-2301.03834

Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan

{"title":"HQAlign:使用电流级建模对SV检测的纳米孔读数进行对齐","authors":"Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan","doi":"arxiv-2301.03834","DOIUrl":null,"url":null,"abstract":"Motivation: Detection of structural variants (SV) from the alignment of\nsample DNA reads to the reference genome is an important problem in\nunderstanding human diseases. Long reads that can span repeat regions, along\nwith an accurate alignment of these long reads play an important role in\nidentifying novel SVs. Long read sequencers such as nanopore sequencing can\naddress this problem by providing very long reads but with high error rates,\nmaking accurate alignment challenging. Many errors induced by nanopore\nsequencing have a bias because of the physics of the sequencing process and\nproper utilization of these error characteristics can play an important role in\ndesigning a robust aligner for SV detection problems. In this paper, we design\nand evaluate HQAlign, an aligner for SV detection using nanopore sequenced\nreads. The key ideas of HQAlign include (i) using basecalled nanopore reads\nalong with the nanopore physics to improve alignments for SVs (ii)\nincorporating SV specific changes to the alignment pipeline (iii) adapting\nthese into existing state-of-the-art long read aligner pipeline, minimap2\n(v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across\ndifferent datasets which are missed by minimap2 alignments while having a\nstandalone performance at par with minimap2 for real nanopore reads data. For\nthe common SV calls between HQAlign and minimap2, HQAlign improves the start\nand the end breakpoint accuracy for about 10%-50% of SVs across different\ndatasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2\n85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13\nassembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to\nGRCh37 human genome.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"4 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HQAlign: Aligning nanopore reads for SV detection using current-level modeling\",\"authors\":\"Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan\",\"doi\":\"arxiv-2301.03834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Detection of structural variants (SV) from the alignment of\\nsample DNA reads to the reference genome is an important problem in\\nunderstanding human diseases. Long reads that can span repeat regions, along\\nwith an accurate alignment of these long reads play an important role in\\nidentifying novel SVs. Long read sequencers such as nanopore sequencing can\\naddress this problem by providing very long reads but with high error rates,\\nmaking accurate alignment challenging. Many errors induced by nanopore\\nsequencing have a bias because of the physics of the sequencing process and\\nproper utilization of these error characteristics can play an important role in\\ndesigning a robust aligner for SV detection problems. In this paper, we design\\nand evaluate HQAlign, an aligner for SV detection using nanopore sequenced\\nreads. The key ideas of HQAlign include (i) using basecalled nanopore reads\\nalong with the nanopore physics to improve alignments for SVs (ii)\\nincorporating SV specific changes to the alignment pipeline (iii) adapting\\nthese into existing state-of-the-art long read aligner pipeline, minimap2\\n(v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across\\ndifferent datasets which are missed by minimap2 alignments while having a\\nstandalone performance at par with minimap2 for real nanopore reads data. For\\nthe common SV calls between HQAlign and minimap2, HQAlign improves the start\\nand the end breakpoint accuracy for about 10%-50% of SVs across different\\ndatasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2\\n85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13\\nassembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to\\nGRCh37 human genome.\",\"PeriodicalId\":501310,\"journal\":{\"name\":\"arXiv - CS - Other Computer Science\",\"volume\":\"4 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Other Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2301.03834\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2301.03834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机:从样本DNA序列到参考基因组的比对中检测结构变异(SV)是了解人类疾病的一个重要问题。可以跨越重复区域的长读取，以及这些长读取的精确对齐在识别新的sv中起着重要作用。像纳米孔测序这样的长读段测序仪可以解决这个问题，因为它提供了很长的读段，但错误率很高，这使得准确的比对具有挑战性。由于测序过程的物理性质，纳米预测序引起的许多误差都具有偏倚性，正确利用这些误差特性可以在设计用于SV检测问题的鲁棒对准器中发挥重要作用。在本文中，我们设计并评估了HQAlign，一种利用纳米孔测序仪检测SV的校准器。HQAlign的关键思想包括(i)使用基本的纳米孔readsalong和纳米孔物理来改善SV的对准;(ii)将SV特定的变化纳入对准管道;(iii)将这些调整到现有的最先进的长读对准管道minimap2(v2.24)中，以实现有效的对准。结果:我们发现HQAlign在不同的数据集上捕获了4%-6%的互补sv，这是minimap2校准所遗漏的，而对于真实的纳米孔读取数据，HQAlign的独立性能与minimap2相当。对于HQAlign和minimap2之间的常见SV调用，HQAlign在不同数据集上提高了大约10%-50%的SV的开始和结束断点精度。此外，HQAlign将纳米孔reads与最近端粒-端粒chm13组装的比对率从minimap285.64%提高到89.35%，将纳米孔reads与grch37人类基因组的比对率从83.48%提高到86.65%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HQAlign: Aligning nanopore reads for SV detection using current-level modeling

Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this paper, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using basecalled nanopore reads along with the nanopore physics to improve alignments for SVs (ii) incorporating SV specific changes to the alignment pipeline (iii) adapting these into existing state-of-the-art long read aligner pipeline, minimap2 (v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across different datasets which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy for about 10%-50% of SVs across different datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2 85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13 assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to GRCh37 human genome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Other Computer Science

自引率

0.00%

发文量