首页 > 最新文献

Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文 中文
A data base of minimally frustrated alpha helical segments extracted from proteins according to an entropy criterion. 根据熵准则从蛋白质中提取最小受挫α螺旋片段的数据库。
R Casadio, M Compiani, P Fariselli, P L Martelli

A data base of minimally frustrated alpha helical segments is defined by filtering a set comprising 822 non redundant proteins, which contain 4783 alpha helical structures. The data base definition is performed using a neural network-based alpha helix predictor, whose outputs are rated according to an entropy criterion. A comparison with the presently available experimental results indicates that a subset of the data base contains the initiation sites of protein folding experimentally detected and also protein fragments which fold into stable isolated alpha helices. This suggests the usage of the data base (and/or of the predictor) to highlight patterns which govern the stability of alpha helices in proteins and the helical behavior of isolated protein fragments.

通过筛选包含822个非冗余蛋白,其中包含4783个α螺旋结构,定义了最小受挫α螺旋片段数据库。数据库定义使用基于神经网络的alpha helix预测器执行,其输出根据熵标准进行评级。与现有实验结果的比较表明,数据库的一个子集包含了实验检测到的蛋白质折叠起始位点和折叠成稳定的分离α螺旋的蛋白质片段。这表明使用数据库(和/或预测器)来突出控制蛋白质中α螺旋的稳定性和分离蛋白质片段的螺旋行为的模式。
{"title":"A data base of minimally frustrated alpha helical segments extracted from proteins according to an entropy criterion.","authors":"R Casadio,&nbsp;M Compiani,&nbsp;P Fariselli,&nbsp;P L Martelli","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A data base of minimally frustrated alpha helical segments is defined by filtering a set comprising 822 non redundant proteins, which contain 4783 alpha helical structures. The data base definition is performed using a neural network-based alpha helix predictor, whose outputs are rated according to an entropy criterion. A comparison with the presently available experimental results indicates that a subset of the data base contains the initiation sites of protein folding experimentally detected and also protein fragments which fold into stable isolated alpha helices. This suggests the usage of the data base (and/or of the predictor) to highlight patterns which govern the stability of alpha helices in proteins and the helical behavior of isolated protein fragments.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Position-specific annotation of protein function based on multiple homologs. 基于多个同源物的蛋白质功能的位置特异性注释。
M A Andrade

I present in this work an algorithm for deriving protein functional annotations which are position-specific. The input is based on the results of a sequence similarity search of the query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and the amino acid conservation is used to compute a score that indicates which descriptor is likely to describe better the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated to different function. Different levels of functional specificity can be compared, allowing to choose the one that suits better the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherence check.

在这项工作中,我提出了一种用于导出位置特异性蛋白质功能注释的算法。输入基于对序列数据库的查询序列的序列相似性搜索的结果。从蛋白质的描述中提取单词串,并使用具有相同描述符的蛋白质与氨基酸守恒之间的相关性来计算分数,该分数表明哪个描述符可能更好地描述每个特定残基的功能。分析分数曲线和比较不同的功能,可以很容易地检测到与不同功能相关的部分序列。可以比较不同水平的功能特异性,从而选择更适合蛋白质功能的蛋白。该算法的直接应用是支持(自动化)蛋白质功能注释方法和数据库一致性检查。
{"title":"Position-specific annotation of protein function based on multiple homologs.","authors":"M A Andrade","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>I present in this work an algorithm for deriving protein functional annotations which are position-specific. The input is based on the results of a sequence similarity search of the query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and the amino acid conservation is used to compute a score that indicates which descriptor is likely to describe better the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated to different function. Different levels of functional specificity can be compared, allowing to choose the one that suits better the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherence check.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. 寻找序列中短基序的精确方法,并应用于核糖体结合位点问题。
M Tompa

This is an investigation of methods for finding short motifs that only occur in a fraction of the input sequences. Unlike local search techniques that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores. This method is illustrated for the Ribosome Binding Site Problem, which is to identify the short mRNA 5' untranslated sequence that is recognized by the ribosome during initiation of protein synthesis. Experiments were performed to solve this problem for each of fourteen sequenced prokaryotes, by applying the method to the full complement of genes from each. One of the interesting results of this experimentation is evidence that the recognized sequence of the thermophilic archaea A. fulgidus, M. jannaschii, M. thermoautotrophicum, and P. horikoshii may be somewhat different than the well known Shine-Dalgarno sequence.

这是一项调查的方法,以寻找短基序,只出现在一小部分的输入序列。与局部搜索技术可能无法达到全局最优不同,本文提出的方法保证产生具有最大z分数的图案。该方法用于解决核糖体结合位点问题,即识别在蛋白质合成起始阶段被核糖体识别的短mRNA 5'未翻译序列。通过将该方法应用于每个原核生物的完整基因补体,对14个测序的原核生物中的每一个进行了实验,以解决这个问题。该实验的一个有趣结果是,已知的嗜热古细菌A. fulgidus、M. jannaschii、M. thermoautotrophicum和P. horikoshii的序列可能与众所周知的Shine-Dalgarno序列有所不同。
{"title":"An exact method for finding short motifs in sequences, with application to the ribosome binding site problem.","authors":"M Tompa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This is an investigation of methods for finding short motifs that only occur in a fraction of the input sequences. Unlike local search techniques that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores. This method is illustrated for the Ribosome Binding Site Problem, which is to identify the short mRNA 5' untranslated sequence that is recognized by the ribosome during initiation of protein synthesis. Experiments were performed to solve this problem for each of fourteen sequenced prokaryotes, by applying the method to the full complement of genes from each. One of the interesting results of this experimentation is evidence that the recognized sequence of the thermophilic archaea A. fulgidus, M. jannaschii, M. thermoautotrophicum, and P. horikoshii may be somewhat different than the well known Shine-Dalgarno sequence.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fidelity probes for DNA arrays. 用于DNA阵列的保真度探针。
E Hubbell, P A Pevzner

One current approach to quality control in DNA array manufacturing is to synthesize a small set of test probes that detect variation in the manufacturing process. These fidelity probes consist of identical copies of the same probe, but they are deliberately manufactured using different steps of the manufacturing process. A known target is hybridized to these probes, and those hybridization results are indicative of the quality of the manufacturing process. It is not only desirable to detect variations, but also to analyze the variations that occur, indicating in what process step the manufacture changed. We describe a combinatorial approach which constructs a small set of fidelity probes that not only detect variations, but also point out the manufacturing step in which a variation has occurred. This algorithm is currently being used in mass-production of DNA arrays at Affyetrix.

目前DNA阵列制造质量控制的一种方法是合成一组检测制造过程中变化的测试探针。这些保真度探头由相同探头的相同副本组成,但它们是故意使用制造过程的不同步骤制造的。已知目标与这些探针杂交,这些杂交结果表明制造过程的质量。不仅需要检测变化,而且需要分析发生的变化,表明在哪个工艺步骤中制造发生了变化。我们描述了一种组合方法,该方法构建了一组小的保真度探头,不仅可以检测变化,还可以指出发生变化的制造步骤。该算法目前正在Affyetrix的DNA阵列的批量生产中使用。
{"title":"Fidelity probes for DNA arrays.","authors":"E Hubbell,&nbsp;P A Pevzner","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>One current approach to quality control in DNA array manufacturing is to synthesize a small set of test probes that detect variation in the manufacturing process. These fidelity probes consist of identical copies of the same probe, but they are deliberately manufactured using different steps of the manufacturing process. A known target is hybridized to these probes, and those hybridization results are indicative of the quality of the manufacturing process. It is not only desirable to detect variations, but also to analyze the variations that occur, indicating in what process step the manufacture changed. We describe a combinatorial approach which constructs a small set of fidelity probes that not only detect variations, but also point out the manufacturing step in which a variation has occurred. This algorithm is currently being used in mass-production of DNA arrays at Affyetrix.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using sequence motifs for enhanced neural network prediction of protein distance constraints. 利用序列基序增强神经网络预测蛋白质距离约束。
J Gorodkin, O Lund, C A Andersen, S Brunak

Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.

研究了多肽链中任何一对氨基酸的序列分离(残基)和距离(埃)之间的关系。对于每个序列分离,我们定义一个距离阈值。对于C α原子之间的距离小于阈值的氨基酸对,发现一个特征序列(标志)基序。基序随着序列分离的增加而变化:对于小的分离,它们由位于两个残基之间的一个峰组成,然后在这些残基上出现额外的峰,最后中心峰在非常大的分离中消失。我们还发现了基序中心残基之间的相关性。这和其他统计分析用于设计与早期工作相比性能增强的神经网络。重要的是,统计分析解释了为什么神经网络比简单的统计数据驱动方法(如对概率密度函数)表现得更好。统计结果还解释了增加序列分离时网络性能的特征。在10-30个残基的序列分离范围内,新网络设计的改进是显著的。最后,我们发现增加序列分离的性能曲线与相应的信息含量直接相关。一个WWW服务器,distanceP,可以在http://www.cbs.dtu.dk/services/distanceP/上找到。
{"title":"Using sequence motifs for enhanced neural network prediction of protein distance constraints.","authors":"J Gorodkin,&nbsp;O Lund,&nbsp;C A Andersen,&nbsp;S Brunak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity. HIV蛋白酶配体的数据库筛选:结合位点构象和表达对配体选择性的影响。
V Schnecke, L A Kuhn

Screening for potential ligands and docking them into the binding sites of proteins is one of the main tasks in computer-aided drug design. Despite the progress in computational power, it remains infeasible to model all the factors involved in molecular recognition, especially when screening databases of more than 100,000 compounds. While ligand flexibility is considered in most approaches, the model of the binding site is rather simplistic, with neither solvation nor induced complementary usually taken into consideration. We present results for screening different databases for HIV-1 protease ligands with our tool Slide, and investigate the extent to which binding-site conformation, solvation, and template representation generate bias. The results suggest a strategy for selecting the optimal binding-site conformation, for cases in which more than one independent structure is available, and selecting a representation of that binding site that yields reproducible results and the identification of known ligands.

筛选潜在的配体并将其对接到蛋白质的结合位点是计算机辅助药物设计的主要任务之一。尽管计算能力有了进步,但对分子识别中涉及的所有因素进行建模仍然是不可行的,尤其是在筛选超过10万种化合物的数据库时。虽然大多数方法都考虑了配体的灵活性,但结合位点的模型过于简单,通常不考虑溶剂化或诱导互补。我们展示了使用Slide工具筛选HIV-1蛋白酶配体的不同数据库的结果,并调查了结合位点构象、溶剂化和模板表示在多大程度上产生偏差。研究结果提出了一种选择最佳结合位点构象的策略,在有多个独立结构可用的情况下,选择一个结合位点的表示,产生可重复的结果和已知配体的鉴定。
{"title":"Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity.","authors":"V Schnecke,&nbsp;L A Kuhn","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Screening for potential ligands and docking them into the binding sites of proteins is one of the main tasks in computer-aided drug design. Despite the progress in computational power, it remains infeasible to model all the factors involved in molecular recognition, especially when screening databases of more than 100,000 compounds. While ligand flexibility is considered in most approaches, the model of the binding site is rather simplistic, with neither solvation nor induced complementary usually taken into consideration. We present results for screening different databases for HIV-1 protease ligands with our tool Slide, and investigate the extent to which binding-site conformation, solvation, and template representation generate bias. The results suggest a strategy for selecting the optimal binding-site conformation, for cases in which more than one independent structure is available, and selecting a representation of that binding site that yields reproducible results and the identification of known ligands.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm combining discrete and continuous methods for optical mapping. 一种结合离散和连续方法的光学映射算法。
R M Karp, I Pe'er, R Shamir

Optical mapping is a novel technique for generating the restriction map of a DNA molecule by observing many single, partially digested, copies of it, using fluorescence microscopy. The real-life problem is complicated by numerous factors: false positive and false negative cut observations, inaccurate location measurements, unknown orientations and faulty molecules. We present an algorithm for solving the real-life problem. The algorithm combines continuous optimization and combinatorial algorithms, applied to a non-uniform discretization of the data. We present encouraging results on real experimental data.

光学作图是一种新的技术,通过荧光显微镜观察DNA分子的许多单一的,部分消化的拷贝来产生DNA分子的限制图谱。现实生活中的问题由于许多因素而变得复杂:假阳性和假阴性切割观察,不准确的位置测量,未知的方向和错误的分子。我们提出了一种解决现实问题的算法。该算法将连续优化算法与组合算法相结合,应用于数据的非均匀离散化。我们在实际实验数据上给出了令人鼓舞的结果。
{"title":"An algorithm combining discrete and continuous methods for optical mapping.","authors":"R M Karp,&nbsp;I Pe'er,&nbsp;R Shamir","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Optical mapping is a novel technique for generating the restriction map of a DNA molecule by observing many single, partially digested, copies of it, using fluorescence microscopy. The real-life problem is complicated by numerous factors: false positive and false negative cut observations, inaccurate location measurements, unknown orientations and faulty molecules. We present an algorithm for solving the real-life problem. The algorithm combines continuous optimization and combinatorial algorithms, applied to a non-uniform discretization of the data. We present encouraging results on real experimental data.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal registration of the expression patterns of Drosophila segmentation genes. 果蝇片段基因表达模式的时空登记。
E M Myasnikova, D Kosman, J Reinitz, M G Samsonova

The application of image registration techniques resulted in the construction of an integrated atlas of Drosophila segmentation gene expression in both space and time. The registration method was based on a quadratic spline approximation with flexible knots. A classifier for automatic attribution of an embryo to one of the temporal classes according to its gene expression pattern was developed.)

应用图像配准技术,构建了果蝇分割基因在空间和时间上的完整图谱。该配准方法基于带柔性节的二次样条近似。开发了一种根据基因表达模式自动将胚胎归为时间类的分类器。
{"title":"Spatio-temporal registration of the expression patterns of Drosophila segmentation genes.","authors":"E M Myasnikova,&nbsp;D Kosman,&nbsp;J Reinitz,&nbsp;M G Samsonova","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The application of image registration techniques resulted in the construction of an integrated atlas of Drosophila segmentation gene expression in both space and time. The registration method was based on a quadratic spline approximation with flexible knots. A classifier for automatic attribution of an embryo to one of the temporal classes according to its gene expression pattern was developed.)</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes. 通过挖掘17个古细菌和细菌基因组的未对齐1D序列,建立1D和3D基序字典。
I Rigoutsos, Y Gao, A Floratos, L Parida

We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

我们使用Teiresias算法在数据库中进行无监督模式发现,该数据库包含来自17个公开的完整古细菌和细菌基因组的未对齐orf,并构建了一个一维基序字典。这些基序,我们称之为小序列,在氨基酸位置水平上占97.88%的基因组输入。该1D字典中的每一个小片段都位于蛋白质数据库38.0版本的序列中,每个小片段实例对应的结构片段被识别并在三维空间上对齐,导致RMSD误差低于预先选择的阈值2.5埃的小片段被输入到结构保守小片段的三维字典中。这两个词典可以被认为是交叉索引,有助于处理诸如基因组序列的自动功能注释、局部同源性鉴定、局部结构表征、比较基因组学等任务。
{"title":"Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.","authors":"I Rigoutsos,&nbsp;Y Gao,&nbsp;A Floratos,&nbsp;L Parida","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A phylogenetic approach to RNA structure prediction. RNA结构预测的系统发育方法。
V R Akmaev, S T Kelley, G D Stormo

Methods based on the Mutual Information statistic (MI methods) predict structure by looking for statistical correlations between sequence positions in a set of aligned sequences. Although MI methods are often quite effective, these methods ignore the underlying phylogenetic relationships of the sequences they analyze. Thus, they cannot distinguish between correlations due to structural interactions, and spurious correlations resulting from phylogenetic history. In this paper, we introduce a method analogous to MI that incorporates phylogenetic information. We show that this method accurately recovers the structures of well-known RNA molecules. We also demonstrate, with both real and simulated data, that this phylogenetically-based method outperforms standard MI methods, and improves the ability to distinguish interacting from non-interacting positions in RNA. This method is flexible, and may be applied to the prediction of protein structure given the appropriate evolutionary model. Because this method incorporates phylogenetic data, it also has the potential to be improved with the addition of more accurate phylogenetic information, although we show that even approximate phylogenies are helpful.

基于互信息统计(MI)的方法通过寻找一组序列中序列位置之间的统计相关性来预测结构。尽管MI方法通常非常有效,但这些方法忽略了它们所分析序列的潜在系统发育关系。因此,他们无法区分由于结构相互作用而产生的相关性和由于系统发育历史而产生的虚假相关性。在本文中,我们引入了一种类似于MI的方法,该方法包含了系统发育信息。我们证明,这种方法准确地恢复了众所周知的RNA分子的结构。我们还通过真实和模拟数据证明,这种基于系统发育的方法优于标准的MI方法,并提高了区分RNA中相互作用和非相互作用位置的能力。该方法是灵活的,并可应用于预测蛋白质的结构给定适当的进化模型。由于该方法包含系统发育数据,因此它也有可能通过添加更准确的系统发育信息来改进,尽管我们表明即使是近似的系统发育也是有帮助的。
{"title":"A phylogenetic approach to RNA structure prediction.","authors":"V R Akmaev,&nbsp;S T Kelley,&nbsp;G D Stormo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Methods based on the Mutual Information statistic (MI methods) predict structure by looking for statistical correlations between sequence positions in a set of aligned sequences. Although MI methods are often quite effective, these methods ignore the underlying phylogenetic relationships of the sequences they analyze. Thus, they cannot distinguish between correlations due to structural interactions, and spurious correlations resulting from phylogenetic history. In this paper, we introduce a method analogous to MI that incorporates phylogenetic information. We show that this method accurately recovers the structures of well-known RNA molecules. We also demonstrate, with both real and simulated data, that this phylogenetically-based method outperforms standard MI methods, and improves the ability to distinguish interacting from non-interacting positions in RNA. This method is flexible, and may be applied to the prediction of protein structure given the appropriate evolutionary model. Because this method incorporates phylogenetic data, it also has the potential to be improved with the addition of more accurate phylogenetic information, although we show that even approximate phylogenies are helpful.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21633634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Conference on Intelligent Systems for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1