通过挖掘17个古细菌和细菌基因组的未对齐1D序列，建立1D和3D基序字典。

Proceedings. International Conference on Intelligent Systems for Molecular Biology Pub Date : 1999-01-01

I Rigoutsos, Y Gao, A Floratos, L Parida

{"title":"通过挖掘17个古细菌和细菌基因组的未对齐1D序列，建立1D和3D基序字典。","authors":"I Rigoutsos, Y Gao, A Floratos, L Parida","doi":"","DOIUrl":null,"url":null,"abstract":"We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.\",\"authors\":\"I Rigoutsos, Y Gao, A Floratos, L Parida\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.\",\"PeriodicalId\":79420,\"journal\":{\"name\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Intelligent Systems for Molecular Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们使用Teiresias算法在数据库中进行无监督模式发现，该数据库包含来自17个公开的完整古细菌和细菌基因组的未对齐orf，并构建了一个一维基序字典。这些基序，我们称之为小序列，在氨基酸位置水平上占97.88%的基因组输入。该1D字典中的每一个小片段都位于蛋白质数据库38.0版本的序列中，每个小片段实例对应的结构片段被识别并在三维空间上对齐，导致RMSD误差低于预先选择的阈值2.5埃的小片段被输入到结构保守小片段的三维字典中。这两个词典可以被认为是交叉索引，有助于处理诸如基因组序列的自动功能注释、局部同源性鉴定、局部结构表征、比较基因组学等任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.

We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

自引率

0.00%

发文量