An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in bioinformatics Pub Date : 2023-02-08 eCollection Date: 2022-01-01 DOI:10.3389/fbinf.2022.1062328
Harry Bowles, Renata Kabiljo, Ahmad Al Khleifat, Ashley Jones, John P Quinn, Richard J B Dobson, Chad M Swanson, Ammar Al-Chalabi, Alfredo Iacoangeli
{"title":"An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data.","authors":"Harry Bowles, Renata Kabiljo, Ahmad Al Khleifat, Ashley Jones, John P Quinn, Richard J B Dobson, Chad M Swanson, Ammar Al-Chalabi, Alfredo Iacoangeli","doi":"10.3389/fbinf.2022.1062328","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9945273/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2022.1062328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估用于检测短线程基因组测序数据中人类内源性逆转录病毒插入的生物信息学工具。
鉴于大量证据表明人类内源性逆转录病毒(HERVs)与许多人类疾病有关,人们对其研究的兴趣与日俱增。尽管其基因组特征描述面临许多技术挑战,但下一代测序(NGS)已显示出检测人类 HERV 插入及其多态性的潜力。目前,已有许多计算工具可用于在短线程 NGS 数据中检测 HERV 插入及其多态性。为了设计最佳的分析管道,需要对现有工具进行独立评估。我们利用各种实验设计和数据集评估了一组此类工具的性能。这些数据集包括 50 个人类短线程全基因组测序样本、匹配的长短线程测序数据以及模拟的短线程 NGS 数据。我们的结果凸显了这些工具在不同数据集上的巨大性能差异,并表明不同的工具可能适用于不同的研究设计。然而,专为检测人类内源性逆转录病毒而设计的专业工具的性能始终优于检测更多转座元件的通用工具。我们建议,如果有足够的计算资源,使用多种 HERV 检测工具来获得一组一致的插入位点可能是理想的选择。此外,鉴于不同工具和数据集的假阳性发现率介于 8% 与 55% 之间,如果有 DNA 样本,我们建议对预测的插入位点进行湿实验室验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
期刊最新文献
The quantum hypercube as a k-mer graph. A review of model evaluation metrics for machine learning in genetics and genomics. Visual analysis of multi-omics data. Molecular docking and molecular dynamic simulation studies to identify potential terpenes against Internalin A protein of Listeria monocytogenes. PhIP-Seq: methods, applications and challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1