VISTA: an integrated framework for structural variant discovery

IF 5.3 2区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY ACS Applied Nano Materials Pub Date : 2024-09-19 DOI:10.1093/bib/bbae462
Varuni Sarwal, Seungmo Lee, Jianzhi Yang, Sriram Sankararaman, Mark Chaisson, Eleazar Eskin, Serghei Mangul
{"title":"VISTA: an integrated framework for structural variant discovery","authors":"Varuni Sarwal, Seungmo Lee, Jianzhi Yang, Sriram Sankararaman, Mark Chaisson, Eleazar Eskin, Serghei Mangul","doi":"10.1093/bib/bbae462","DOIUrl":null,"url":null,"abstract":"Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn’s disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.","PeriodicalId":6,"journal":{"name":"ACS Applied Nano Materials","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Nano Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae462","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn’s disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
VISTA:结构变异发现综合框架
结构变异(SV)是指人类基因组中的插入、缺失、倒位和重复。SV 约占人类基因组的 1.5%。然而,这一小部分遗传变异却与牛皮癣、克罗恩病和其他自身免疫性疾病、自闭症谱系和其他神经发育性疾病以及精神分裂症的发病机制有关。由于识别结构变异是遗传学中的一个重要问题,人们开发了一些专门的计算技术,直接从测序数据中检测结构变异。随着全基因组测序(WGS)技术的进步,大量 SV 检测方法应运而生。然而,从 WGS 数据中剖析 SV 仍是一项挑战,因为大多数 SV 检测方法的假阳性率都很高,而且没有一种现有方法能精确检测出样本中存在的所有 SV。以前的研究表明,现有的 SV 调用器都不能在不同 SV 长度和基因组覆盖范围内保持高准确性。在这里,我们报告了一种综合结构变异调用框架--变异识别与结构变异分析(VISTA),它利用一种新颖、稳健的过滤与合并算法,充分利用了单个调用器的结果。与忽略长度和覆盖率的现有基于共识的工具相比,VISTA 克服了这一局限性,它根据变异长度和基因组覆盖率执行各种表现最佳的调用者组合,以高精度生成 SV 事件。我们评估了 VISTA 在不同生物体和不同覆盖率的综合黄金标准数据集上的性能。我们使用 "Genome-in-a-Bottle "黄金标准 SV 集、来自人类 Pangenome Reference Consortium 的单倍型解析从头组装以及内部聚合酶链式反应 (PCR) 验证的小鼠黄金标准集对 VISTA 进行了基准测试。在使用跨小鼠和人类基因组的综合金标准测量的基于共识的顶级工具中,VISTA 保持了最高的 F1 分数。VISTA 还有一个优化模式,可以对调用的精确度或召回率进行优化。在其他变异调用工具中,经过优化的 VISTA 可以达到 100% 的精确度和最高的灵敏度。总之,VISTA 代表了结构变异调用领域的重大进步,它提供了一个稳健而准确的框架,优于现有的基于共识的工具,为基因组研究中的 SV 检测设定了新标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.30
自引率
3.40%
发文量
1601
期刊介绍: ACS Applied Nano Materials is an interdisciplinary journal publishing original research covering all aspects of engineering, chemistry, physics and biology relevant to applications of nanomaterials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important applications of nanomaterials.
期刊最新文献
FGL2172-220 peptides improve the antitumor effect of HCMV-IE1mut vaccine against glioblastoma by modulating immunosuppressive cells in the tumor microenvironment. HLA class II neoantigen presentation for CD4+ T cell surveillance in HLA class II-negative colorectal cancer. Pretreatment With Unfractionated Heparin in ST-Elevation Myocardial Infarction—a Propensity Score Matching Analysis. The Diagnosis and Treatment of Hypertrophic Cardiomyopathy. Clinical Practice Guideline: Condylar Hyperplasia of the Mandible—Diagnosis and Treatment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1