基于哈希函数的过度表示库适配体的无环识别

Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis
{"title":"基于哈希函数的过度表示库适配体的无环识别","authors":"Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis","doi":"10.1109/NEBEC.2013.2","DOIUrl":null,"url":null,"abstract":"In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).","PeriodicalId":153112,"journal":{"name":"2013 39th Annual Northeast Bioengineering Conference","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acyclic Identification of Aptamer from Over-Represented Libraries Using Hash Functions\",\"authors\":\"Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis\",\"doi\":\"10.1109/NEBEC.2013.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).\",\"PeriodicalId\":153112,\"journal\":{\"name\":\"2013 39th Annual Northeast Bioengineering Conference\",\"volume\":\"133 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 39th Annual Northeast Bioengineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEBEC.2013.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 39th Annual Northeast Bioengineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEBEC.2013.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,随着快速测序技术的出现,基因组数据库发展迅速。生物信息学领域的研究人员期待更快、更准确的工具来有效地分析庞大的数据集。在适体搜索的背景下,目标是在同一芯片上与随机背景文库比较,搜索过度代表的DNA序列。哈希函数广泛应用于子串比较、序列比对和聚类工具中。我们开发了一个轻量级的工具,利用哈希函数来减小基因组数据的大小,并对质心序列进行k邻域搜索。与现有工具相比,这大大提高了搜索效率。此外,k邻居哈希值的计算减少了突变体搜索开销。在100万个序列的数据集中,该程序准确地计算了人类α -凝血酶序列的频率,并在不到40秒的时间内找到了目标序列的突变版本,而现有的方法需要8280秒(2小时13分钟)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Acyclic Identification of Aptamer from Over-Represented Libraries Using Hash Functions
In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparison between T2 Relaxation Time and Storage Modulus for Agarose Gel The Electroencephalographic Response during a Driving Process: Normal Driving, Turning and Collision Biocompatibility of CaO-Na2O-SiO2/TiO2 Glass Ceramic Scaffolds for Orthopaedic Applications Improvement on Dental Ceramics Using Microwave Sintering Influence of Eccentric Loading and Size of Implant Fixture on the Stress Distribution in the Implant
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1