UPS-indel: A better approach for finding indel redundancy

M. S. Hasan, Xiaowei Wu, L. Watson, Zhiyi Li, Liqing Zhang
{"title":"UPS-indel: A better approach for finding indel redundancy","authors":"M. S. Hasan, Xiaowei Wu, L. Watson, Zhiyi Li, Liqing Zhang","doi":"10.1109/ICCABS.2016.7802793","DOIUrl":null,"url":null,"abstract":"Indel which represents the insertion and deletion of base pairs in the sequence of an organism is a very common form of genetic variation that takes place in the human genome. Being responsible for genetic diversity and human disease, indels have been considered as an important area in the genome research community. With progress in Next Generation Sequencing (NGS), a good number of indel calling tools have been developed and different databases store the results of different indel calling tools for future research. Different indels, though differing in allele sequence and position, can be biologically equivalent when they lead to the same altered sequences. Storing these biologically equivalent indels as distinct entries in databases causes data redundancy. Previous research showed that about 10% human indels stored in dbSNP are redundant due to lack of a unified system for identifying and representing equivalent indels. In this paper we describe UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be identified easily by a simple comparison of their coordinates generated by the proposed positioning system. Applying UPS-indel, we identify nearly 15% redundant indels in dbSNP (version 142) across all human chromosomes, higher than the previous report. UPS-indel is written in C++ and is freely available at http://bench.cs.vt.edu/ups-indel.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"90 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2016.7802793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Indel which represents the insertion and deletion of base pairs in the sequence of an organism is a very common form of genetic variation that takes place in the human genome. Being responsible for genetic diversity and human disease, indels have been considered as an important area in the genome research community. With progress in Next Generation Sequencing (NGS), a good number of indel calling tools have been developed and different databases store the results of different indel calling tools for future research. Different indels, though differing in allele sequence and position, can be biologically equivalent when they lead to the same altered sequences. Storing these biologically equivalent indels as distinct entries in databases causes data redundancy. Previous research showed that about 10% human indels stored in dbSNP are redundant due to lack of a unified system for identifying and representing equivalent indels. In this paper we describe UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be identified easily by a simple comparison of their coordinates generated by the proposed positioning system. Applying UPS-indel, we identify nearly 15% redundant indels in dbSNP (version 142) across all human chromosomes, higher than the previous report. UPS-indel is written in C++ and is freely available at http://bench.cs.vt.edu/ups-indel.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UPS-indel:一种更好的查找indel冗余的方法
Indel表示生物体序列中碱基对的插入和删除,是人类基因组中发生的一种非常常见的遗传变异形式。基因是遗传多样性和人类疾病的研究对象,是基因组研究界的一个重要领域。随着下一代测序(NGS)技术的发展,大量的indel调用工具被开发出来,不同的数据库存储了不同indel调用工具的结果,以供未来的研究。不同的基因,虽然等位基因的序列和位置不同,但当它们导致相同的序列改变时,在生物学上是等效的。将这些生物等效的索引存储为数据库中的不同条目会导致数据冗余。先前的研究表明,由于缺乏统一的识别和表示等效索引的系统,大约10%存储在dbSNP中的人类索引是冗余的。在本文中,我们描述了UPS-indel,这是一个实用工具,它为索引创建了一个通用的定位系统,因此通过简单地比较拟议的定位系统生成的坐标,可以很容易地识别等效的索引。应用UPS-indel,我们在所有人类染色体中发现了dbSNP(142版)中近15%的冗余索引,高于之前的报告。UPS-indel是用c++编写的,可以在http://bench.cs.vt.edu/ups-indel上免费获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computational Advances in Bio and Medical Sciences: 11th International Conference, ICCABS 2021, Virtual Event, December 16–18, 2021, Revised Selected Papers Computational Advances in Bio and Medical Sciences: 10th International Conference, ICCABS 2020, Virtual Event, December 10-12, 2020, Revised Selected Papers Single-Cell Gene Regulatory Network Analysis Reveals Potential Mechanisms of Action of Antimalarials Against SARS-CoV-2 Computational Study of Action Potential Generation in Urethral Smooth Muscle Cell DNA Read Feature Importance Using Machine Learning for Read Alignment Categories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1