BLAST树:基因组序列分类的快速过滤

Stuart King, Yanni Sun, James R. Cole, S. Pramanik
{"title":"BLAST树:基因组序列分类的快速过滤","authors":"Stuart King, Yanni Sun, James R. Cole, S. Pramanik","doi":"10.1109/BIBE.2010.74","DOIUrl":null,"url":null,"abstract":"With the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. It is optimized for protein annotation in metagenomic data sets because: 1) it is fast, 2) it can classify short protein fragments encoded by individual sequence reads, 3) it can find homologs to novel or rare protein families when there is not enough member sequences to build a probabilistic model. Our algorithm builds a new indexing data structure called BlastTree, which can index a large sequence family database because of our effective compression techniques. In addition, BlastTree fully exploits sequence family membership information to improve homology search sensitivity. When the BlastTree Search algorithm is incorporated into Hmmer, it runs in a fraction of the time with comparable quality.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"BLAST Tree: Fast Filtering for Genomic Sequence Classification\",\"authors\":\"Stuart King, Yanni Sun, James R. Cole, S. Pramanik\",\"doi\":\"10.1109/BIBE.2010.74\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. It is optimized for protein annotation in metagenomic data sets because: 1) it is fast, 2) it can classify short protein fragments encoded by individual sequence reads, 3) it can find homologs to novel or rare protein families when there is not enough member sequences to build a probabilistic model. Our algorithm builds a new indexing data structure called BlastTree, which can index a large sequence family database because of our effective compression techniques. In addition, BlastTree fully exploits sequence family membership information to improve homology search sensitivity. When the BlastTree Search algorithm is incorporated into Hmmer, it runs in a fraction of the time with comparable quality.\",\"PeriodicalId\":330904,\"journal\":{\"name\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on BioInformatics and BioEngineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2010.74\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on BioInformatics and BioEngineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2010.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

随着下一代测序和培养独立方法的出现,我们现在正在从微生物群落中积累大量的宏基因组数据。这些数据集很大,难以组装,并且可能编码罕见或新的蛋白质,这对蛋白质同源性搜索提出了新的计算挑战。本文提出了一种新的蛋白质同源搜索算法,该算法结合了成对序列比对程序(如Blast)和基于蛋白质家族的工具(如Hmmer)的显著特征。它对宏基因组数据集中的蛋白质注释进行了优化,因为:1)速度快,2)它可以对单个序列reads编码的短蛋白质片段进行分类,3)当成员序列不足时,它可以找到新的或罕见的蛋白质家族的同源物来建立概率模型。我们的算法建立了一个新的索引数据结构,称为BlastTree,由于我们有效的压缩技术,它可以索引大型序列族数据库。此外,BlastTree充分利用序列家族成员信息,提高同源性搜索的灵敏度。当BlastTree Search算法被整合到Hmmer中时,它可以在相当短的时间内以相当的质量运行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BLAST Tree: Fast Filtering for Genomic Sequence Classification
With the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. It is optimized for protein annotation in metagenomic data sets because: 1) it is fast, 2) it can classify short protein fragments encoded by individual sequence reads, 3) it can find homologs to novel or rare protein families when there is not enough member sequences to build a probabilistic model. Our algorithm builds a new indexing data structure called BlastTree, which can index a large sequence family database because of our effective compression techniques. In addition, BlastTree fully exploits sequence family membership information to improve homology search sensitivity. When the BlastTree Search algorithm is incorporated into Hmmer, it runs in a fraction of the time with comparable quality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Assessment of the Binding Characteristics of Human Immunodeficiency Virus Type 1 Glycoprotein120 and Host Cluster of Differentiation4 Using Digital Signal Processing Detection of Mild Cognitive Impairment Using Image Differences and Clinical Features Quantification and Analysis of Combination Drug Synergy in High-Throughput Transcriptome Studies Gene Set Analysis with Covariates A Comparative Study of a Novel AE-nLMS Filter and Two Traditional Filters in Predicting Respiration Induced Motion of the Tumor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1