Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

IF 1 Q4 GENETICS & HEREDITY Human Genome Variation Pub Date : 2024-04-17 DOI:10.1038/s41439-024-00276-x
Shunichi Kosugi, Chikashi Terao
{"title":"Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data","authors":"Shunichi Kosugi, Chikashi Terao","doi":"10.1038/s41439-024-00276-x","DOIUrl":null,"url":null,"abstract":"<p>Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.</p>","PeriodicalId":36861,"journal":{"name":"Human Genome Variation","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genome Variation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s41439-024-00276-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过短长线程测序数据检测到的 SNV、嵌合体和结构变异的比较评估
短线程和长线程测序技术通常用于检测DNA变异,包括SNVs、indels和结构变异(SVs)。然而,人们对短线程和长线程数据在检测变异的质量和数量上的差异还不完全了解。在本研究中,我们采用一种结合人工目测的新型评估框架,全面评估了基于短读数和长读数的 SNV、indel 和 SV 检测算法(6 种检测 SNV,12 种检测 indel,13 种检测 SV)的变异调用性能。结果表明,与基于长读数的算法相比,基于短读数的检测算法对大于 10 bp 的 indel 插入调用的检测能力较差;但是,SNV 和 indel 缺失检测的召回率和精确度在短读数和长读数数据中相似。在重复区域,尤其是中小型 SV 的检测中,基于短读数算法的 SV 检测召回率明显低于基于长读数算法的 SV 检测召回率。相比之下,短读取数据和长读取数据在非重复区域 SV 检测的召回率和精确度相似。这些研究结果表明有必要改进策略,例如结合多种变异检测算法,利用短读数数据生成更完整的变异集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Human Genome Variation
Human Genome Variation Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
2.30
自引率
0.00%
发文量
39
审稿时长
13 weeks
期刊最新文献
Detecting adaptive changes in gene copy number distribution accompanying the human out-of-Africa expansion. Novel MLH1 nonsense variant in a patient with suspected Lynch syndrome Genetic investigation of patients with autosomal recessive ataxia and identification of two novel variants in the SQSTM1 and SYNE1 genes. Wilson disease (novel ATP7B variants) with concomitant FLNC-related cardiomyopathy. A case of severe Aicardi-Goutières syndrome with a homozygous RNASEH2B intronic variant.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1