{"title":"Fast alignment of reads to a variation graph with application to SNP detection.","authors":"Maurilio Monsu, Matteo Comin","doi":"10.1515/jib-2021-0032","DOIUrl":null,"url":null,"abstract":"<p><p>Sequencing technologies has provided the basis of most modern genome sequencing studies due to its high base-level accuracy and relatively low cost. One of the most demanding step is mapping reads to the human reference genome. The reliance on a single reference human genome could introduce substantial biases in downstream analyses. Pangenomic graph reference representations offer an attractive approach for storing genetic variations. Moreover, it is possible to include known variants in the reference in order to make read mapping, variant calling, and genotyping variant-aware. Only recently a framework for variation graphs, <i>vg</i> [Garrison E, Adam MN, Siren J, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9], have improved variation-aware alignment and variant calling in general. The major bottleneck of <i>vg</i> is its high cost of reads mapping to a variation graph. In this paper we study the problem of SNP calling on a variation graph and we present a fast reads alignment tool, named VG SNP-Aware. VG SNP-Aware is able align reads exactly to a variation graph and detect SNPs based on these aligned reads. The results show that VG SNP-Aware can efficiently map reads to a variation graph with a speedup of 40× with respect to <i>vg</i> and similar accuracy on SNPs detection.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709736/pdf/","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2021-0032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 3
Abstract
Sequencing technologies has provided the basis of most modern genome sequencing studies due to its high base-level accuracy and relatively low cost. One of the most demanding step is mapping reads to the human reference genome. The reliance on a single reference human genome could introduce substantial biases in downstream analyses. Pangenomic graph reference representations offer an attractive approach for storing genetic variations. Moreover, it is possible to include known variants in the reference in order to make read mapping, variant calling, and genotyping variant-aware. Only recently a framework for variation graphs, vg [Garrison E, Adam MN, Siren J, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9], have improved variation-aware alignment and variant calling in general. The major bottleneck of vg is its high cost of reads mapping to a variation graph. In this paper we study the problem of SNP calling on a variation graph and we present a fast reads alignment tool, named VG SNP-Aware. VG SNP-Aware is able align reads exactly to a variation graph and detect SNPs based on these aligned reads. The results show that VG SNP-Aware can efficiently map reads to a variation graph with a speedup of 40× with respect to vg and similar accuracy on SNPs detection.
测序技术由于其较高的基础精度和相对较低的成本,为大多数现代基因组测序研究提供了基础。其中要求最高的一步是绘制人类参考基因组的图谱。对单一参考人类基因组的依赖可能会在下游分析中引入实质性的偏差。泛基因组图参考表示为存储遗传变异提供了一种有吸引力的方法。此外,可以在参考文献中包括已知的变体,以便进行读取映射,变体调用和基因分型变体感知。直到最近才有了一个变化图的框架[Garrison E, Adam MN, Siren J,等]。变异图工具包通过表示参考文献中的遗传变异来改进读映射。生物技术学报,2018;36:875-9],改进了变异感知校准和变异调用。vg的主要瓶颈是读取映射到变化图的高成本。本文研究了变异图上的SNP调用问题,提出了一种快速读取比对工具——VG SNP- aware。VG SNP-Aware能够将读取精确地对齐到变异图上,并基于这些对齐的读取检测snp。结果表明,VG SNP-Aware可以有效地将读取映射到变化图上,相对于VG的速度提高了40倍,并且在snp检测上具有相似的准确性。