Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文中文

The ribosome scanning model for translation initiation: implications for gene prediction and full-length cDNA detection. 翻译起始的核糖体扫描模型:基因预测和全长cDNA检测的意义。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

P Agarwal, V Bafna

Biological signals, such as the start of protein translation in eukaryotic mRNA, are stretches of nucleotides recognized by cellular machinery. There are a variety of techniques for modeling and identifying them. Most of these techniques either assume that the base pairs at each position of the signal are independently distributed, or they allow for limited dependencies among different positions. In previous work, we provided a statistical model that generalizes earlier methods and captures all significant high-order dependencies among different base positions. In this paper, we use a set of experimentally verified translation initiation (TI) sites (provided by Amos Bairoch) from eukaryotic sequences to train a range of methods, and then compare these methods. None of the methods is effective in predicting TI sites. We take advantage of the ribosome scanning model (Cigan et al., 1988) to significantly improve the prediction accuracy for full-length mRNAs. The ribosome scanning model suggests scanning from the 5' end of the capped mRNA and initiating translation at the first AUG in good context. This reduces the search space dramatically and accounts for its effectiveness. The success of this approach illustrates how biological ideas can illuminate and help solve challenging problems in computational biology.

生物信号，如真核mRNA中蛋白质翻译的开始，是由细胞机制识别的核苷酸的延伸。有各种各样的技术可以对它们进行建模和识别。大多数这些技术要么假设信号每个位置的碱基对是独立分布的，要么它们允许不同位置之间的有限依赖。在之前的工作中，我们提供了一个统计模型，该模型概括了早期的方法，并捕获了不同碱基位置之间所有重要的高阶依赖关系。在本文中，我们从真核生物序列中使用一组实验验证的翻译起始(TI)位点(由Amos Bairoch提供)来训练一系列方法，然后比较这些方法。没有一种方法能有效预测TI位点。我们利用核糖体扫描模型(Cigan et al.， 1988)显著提高了全长mrna的预测精度。核糖体扫描模型表明，在良好的环境下，从封帽mRNA的5'端开始扫描，并在第一个AUG开始翻译。这大大减少了搜索空间，并说明了它的有效性。这种方法的成功说明了生物学思想如何能够阐明和帮助解决计算生物学中具有挑战性的问题。

{"title":"The ribosome scanning model for translation initiation: implications for gene prediction and full-length cDNA detection.","authors":"P Agarwal, V Bafna","doi":"","DOIUrl":"","url":null,"abstract":"Biological signals, such as the start of protein translation in eukaryotic mRNA, are stretches of nucleotides recognized by cellular machinery. There are a variety of techniques for modeling and identifying them. Most of these techniques either assume that the base pairs at each position of the signal are independently distributed, or they allow for limited dependencies among different positions. In previous work, we provided a statistical model that generalizes earlier methods and captures all significant high-order dependencies among different base positions. In this paper, we use a set of experimentally verified translation initiation (TI) sites (provided by Amos Bairoch) from eukaryotic sequences to train a range of methods, and then compare these methods. None of the methods is effective in predicting TI sites. We take advantage of the ribosome scanning model (Cigan et al., 1988) to significantly improve the prediction accuracy for full-length mRNAs. The ribosome scanning model suggests scanning from the 5' end of the capped mRNA and initiating translation at the first AUG in good context. This reduces the search space dramatically and accounts for its effectiveness. The success of this approach illustrates how biological ideas can illuminate and help solve challenging problems in computational biology.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"6 ","pages":"2-7"},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20695510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The LabFlow system for workflow management in large scale biology research laboratories. LabFlow系统用于大型生物学研究实验室的工作流程管理。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

N Goodman, S Rozen, L D Stein

LabFlow is a workflow management system designed for large scale biology research laboratories. It provides a workflow model in which objects flow from task to task under programmatic control. The model supports parallelism, meaning that an object can flow down several paths simultaneously, and sub-workflows which can be invoked subroutine-style from a task. The system allocates tasks to Unix processes to achieve requisite levels of multiprocessing. The system uses the LabBase data management system to store workflow-state and laboratory results. LabFlow provides a Per15 object-oriented framework for defining workflows, and an engine for executing these. The software is freely available.

LabFlow是一个为大型生物学研究实验室设计的工作流管理系统。它提供了一个工作流模型，其中对象在编程控制下从一个任务流向另一个任务。该模型支持并行性，这意味着一个对象可以同时沿着几个路径流动，而子工作流可以从任务中调用子例程样式。该系统将任务分配给Unix进程，以实现必要的多处理级别。系统采用LabBase数据管理系统来存储工作流程状态和实验结果。LabFlow提供了用于定义工作流的Per15面向对象框架，以及用于执行工作流的引擎。该软件是免费的。

引用次数: 0

TAMBIS--Transparent Access to Multiple Bioinformatics Information Sources. TAMBIS—透明访问多个生物信息学信息源。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

P G Baker, A Brass, S Bechhofer, C Goble, N Paton, R Stevens

The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biological Concept Model), a model of the underlying data sources (the Source Model) and a 'knowledge-driven' user interface. Biological concepts are captured in the knowledge base using a description logic called GRAIL. The Concept Model provides the user with the concepts necessary to construct a wide range of multiple-source queries, and the user interface provides a flexible means of constructing and manipulating those queries. The Source Model provides a description of the underlying sources and mappings between terms used in the sources and terms in the biological Concept Model. The Concept Model and Source Model provide a level of indirection that shields the user from source details, providing a high level of source transparency. Source independent, declarative queries formed from terms in the Concept Model are transformed into a set of source dependent, executable procedures. Query formulation, translation and execution is demonstrated using a working example.

TAMBIS项目旨在提供对不同生物数据库和分析工具的透明访问，使用户能够以最小的努力利用广泛的资源。已经开发了一个原型系统，其中包括一个生物学术语知识库(生物学概念模型)、一个底层数据源模型(源模型)和一个“知识驱动”的用户界面。使用称为GRAIL的描述逻辑在知识库中捕获生物概念。概念模型为用户提供了构建各种多源查询所需的概念，用户界面提供了构建和操作这些查询的灵活方法。源模型提供了底层源的描述，以及源中使用的术语与生物概念模型中的术语之间的映射。概念模型和源模型提供了一定程度的间接性，使用户不了解源细节，从而提供了高度的源透明度。由概念模型中的术语形成的源独立的声明性查询被转换为一组源依赖的可执行过程。使用一个工作示例演示了查询的制定、翻译和执行。

{"title":"TAMBIS--Transparent Access to Multiple Bioinformatics Information Sources.","authors":"P G Baker, A Brass, S Bechhofer, C Goble, N Paton, R Stevens","doi":"","DOIUrl":"","url":null,"abstract":"The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biological Concept Model), a model of the underlying data sources (the Source Model) and a 'knowledge-driven' user interface. Biological concepts are captured in the knowledge base using a description logic called GRAIL. The Concept Model provides the user with the concepts necessary to construct a wide range of multiple-source queries, and the user interface provides a flexible means of constructing and manipulating those queries. The Source Model provides a description of the underlying sources and mappings between terms used in the sources and terms in the biological Concept Model. The Concept Model and Source Model provide a level of indirection that shields the user from source details, providing a high level of source transparency. Source independent, declarative queries formed from terms in the Concept Model are transformed into a set of source dependent, executable procedures. Query formulation, translation and execution is demonstrated using a working example.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"6 ","pages":"25-34"},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20695513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phylogenetic inference in protein superfamilies: analysis of SH2 domains. 蛋白质超家族的系统发育推断:SH2结构域的分析。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

K Sjölander

This work focuses on the inference of evolutionary relationships in protein superfamilies, and the uses of these relationships to identify key positions in the structure, to infer attributes on the basis of evolutionary distance, and to identify potential errors in sequence annotations. Relative entropy, a distance metric from information theory, is used in combination with Dirichlet mixture priors to estimate a phylogenetic tree for a set of proteins. This method infers key structural or functional positions in the molecule, and guides the tree topology to preserve these important positions within subtrees. Minimum-description-length principles are used to determine a cut of the tree into subtrees, to identify the subfamilies in the data. This method is demonstrated on SH2-domain containing proteins, resulting in a new subfamily assignment for Src2-drome and a suggested evolutionary relationship between Nck_human and Drk_drome, Sem5_caeel, Grb2_human and Grb2_chick.

本研究的重点是蛋白质超家族进化关系的推断，并利用这些关系来识别结构中的关键位置，根据进化距离推断属性，并识别序列注释中的潜在错误。相对熵是一种来自信息论的距离度量，它与Dirichlet混合先验相结合用于估计一组蛋白质的系统发育树。该方法推断出分子中关键的结构或功能位置，并引导树拓扑结构在子树中保留这些重要位置。最小描述长度原则用于确定将树切割成子树，以识别数据中的子族。该方法在含有sh2结构域的蛋白上得到了验证，得到了Src2-drome的新亚家族分配，并提出了Nck_human与Drk_drome、Sem5_caeel、Grb2_human和Grb2_chick之间的进化关系。

引用次数: 0

Computational applications of DNA structural scales. DNA结构尺度的计算应用。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

P Baldi, Y Chauvin, S Brunak, J Gorodkin, A G Pedersen

We study from a computational standpoint several different physical scales associated with structural features of DNA sequences, including dinucleotide scales such as base stacking energy and propeller twist, and trinucleotide scales such as bendability and nucleosome positioning. We show that these scales provide an alternative or complementary compact representation of DNA sequences. As an example we construct a strand invariant representation of DNA sequences. The scales can also be used to analyze and discover new DNA structural patterns, especially in combinations with hidden Markov models (HMMs). The scales are applied to HMMs of human promoter sequences revealing a number of significant differences between regions upstream and downstream of the transcriptional start point. Finally we show, with some qualifications, that such scales are by and large independent, and therefore complement each other.

我们从计算的角度研究了与DNA序列结构特征相关的几种不同的物理尺度，包括碱基堆叠能和螺旋桨扭曲等二核苷酸尺度，以及可弯曲性和核小体定位等三核苷酸尺度。我们表明，这些尺度提供了DNA序列的替代或互补的紧凑表示。作为一个例子，我们构建了DNA序列的链不变表示。这些尺度还可以用来分析和发现新的DNA结构模式，特别是与隐马尔可夫模型(hmm)结合使用时。该量表应用于人类启动子序列的hmm，揭示了转录起点上游和下游区域之间的许多显著差异。最后，我们在一些限定条件下表明，这些尺度基本上是独立的，因此是相互补充的。

引用次数: 0

A statistical theory of sequence alignment with gaps. 带间隙序列比对的统计理论。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

D Drasdo, T Hwa, M Lässig

A statistical theory of local alignment algorithms with gaps is presented. Both the linear and logarithmic phases, as well as the phase transition separating the two phases, are described in a quantitative way. Markov sequences without mutual correlations are shown to have scale-invariant alignment statistics. Deviations from scale invariance indicate the presence of mutual correlations detectable by alignment algorithms. Conditions are obtained for the optimal detection of a class of mutual sequence correlations.

提出了一种具有间隙的局部对齐算法的统计理论。线性相和对数相，以及将这两相分离的相变，都以定量的方式进行了描述。没有相互关联的马尔可夫序列被证明具有尺度不变的排列统计。偏离尺度不变性表明存在可通过对准算法检测到的相互相关性。得到了一类互序列相关性的最优检测条件。

引用次数: 0

GeneExpress: a computer system for description, analysis, and recognition of regulatory sequences in eukaryotic genome. GeneExpress:一种描述、分析和识别真核生物基因组调控序列的计算机系统。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

N A Kolchanov, M P Ponomarenko, A E Kel, Kondrakhin YuV, A S Frolov, F A Kolpakov, T N Goryachkovsky, O V Kel, E A Ananko, E V Ignatieva, O A Podkolodnaya, V N Babenko, I L Stepanenko, A G Romashchenko, T I Merkulova, D G Vorobiev, S V Lavryushev, Ponomarenko YuV, A V Kochetov, G B Kolesov, V V Solovyev, L Milanesi, N L Podkolodny, E Wingender, T Heinemeyer

GeneExpress system has been designed to integrate description, analysis, and recognition of eukaryotic regulatory sequences. The system includes 5 basic units: (1) GeneNet contains an object-oriented database for accumulation of data on gene networks and signal transduction pathways and a Java-based viewer that allows an exploration and visualization of the GeneNet information; (2) Transcription Regulation combines the database on transcription regulatory regions of eukaryotic genes (TRRD) and TRRD Viewer; (3) Transcription Factor Binding Site Recognition contains a compilation of transcription factor binding sites (TFBSC) and programs for their analysis and recognition; (4) mRNA Translation is designed for analysis of structural and contextual features of mRNA 5'UTRs and prediction of their translation efficiency; and (5) ACTIVITY is the module for analysis and site activity prediction of a given nucleotide sequence. Integration of the databases in the GeneExpress is based on the Sequence Retrieval System (SRS) created in the European Bioinformatics Institute.

GeneExpress系统旨在整合真核生物调控序列的描述、分析和识别。该系统包括5个基本单元:(1)GeneNet包含一个面向对象的数据库，用于积累基因网络和信号转导途径的数据，以及一个基于java的查看器，允许对GeneNet信息进行探索和可视化;(2)转录调控结合真核基因转录调控区数据库(TRRD)和TRRD Viewer;(3)转录因子结合位点识别包含转录因子结合位点(TFBSC)的汇编及其分析和识别程序;(4) mRNA翻译旨在分析mRNA 5' utr的结构和上下文特征，并预测其翻译效率;(5)活性是对给定核苷酸序列进行分析和位点活性预测的模块。GeneExpress数据库的整合是基于欧洲生物信息学研究所创建的序列检索系统(SRS)。

{"title":"GeneExpress: a computer system for description, analysis, and recognition of regulatory sequences in eukaryotic genome.","authors":"N A Kolchanov, M P Ponomarenko, A E Kel, Kondrakhin YuV, A S Frolov, F A Kolpakov, T N Goryachkovsky, O V Kel, E A Ananko, E V Ignatieva, O A Podkolodnaya, V N Babenko, I L Stepanenko, A G Romashchenko, T I Merkulova, D G Vorobiev, S V Lavryushev, Ponomarenko YuV, A V Kochetov, G B Kolesov, V V Solovyev, L Milanesi, N L Podkolodny, E Wingender, T Heinemeyer","doi":"","DOIUrl":"","url":null,"abstract":"GeneExpress system has been designed to integrate description, analysis, and recognition of eukaryotic regulatory sequences. The system includes 5 basic units: (1) GeneNet contains an object-oriented database for accumulation of data on gene networks and signal transduction pathways and a Java-based viewer that allows an exploration and visualization of the GeneNet information; (2) Transcription Regulation combines the database on transcription regulatory regions of eukaryotic genes (TRRD) and TRRD Viewer; (3) Transcription Factor Binding Site Recognition contains a compilation of transcription factor binding sites (TFBSC) and programs for their analysis and recognition; (4) mRNA Translation is designed for analysis of structural and contextual features of mRNA 5'UTRs and prediction of their translation efficiency; and (5) ACTIVITY is the module for analysis and site activity prediction of a given nucleotide sequence. Integration of the databases in the GeneExpress is based on the Sequence Retrieval System (SRS) created in the European Bioinformatics Institute.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"6 ","pages":"95-104"},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Segment-based scores for pairwise and multiple sequence alignments. 两两和多序列比对的基于片段的分数。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

B Morgenstern, W R Atchley, K Hahn, A Dress

In this paper, we discuss a novel scoring scheme for sequence alignments. The score of an alignment is defined as the sum of so-called weights of aligned segment pairs. A simple modification of the weight function used by the original version of the DIALIGN alignment program turns out to have a crucial advantage: it can be applied to both, global and local alignment problems without the need to specify a threshold parameter.

本文讨论了一种新的序列比对评分方案。对齐的得分被定义为所谓的对齐段对的权重之和。DIALIGN对齐程序的原始版本所使用的权函数的简单修改具有一个关键的优势:它可以应用于全局和局部对齐问题，而无需指定阈值参数。

引用次数: 0

A surface measure for probabilistic structural computations. 用于概率结构计算的表面度量。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

J P Schmidt, C C Chen, J L Cooper, R B Altman

Computing three-dimensional structures from sparse experimental constraints requires method for combining heterogeneous sources of information, such as distances, angles, and measures of total volume, shape, and surface. For some types of information, such as distances between atoms, numerous methods are available for computing structures that satisfy the provided constraints. It is more difficult, however, to use information about the degree to which an atom is on the surface or buried as a useful constraint during structure computations. Surface measures have been used as accept/reject criteria for previously computed structures, but this is not an efficient strategy. In this paper, we investigate the efficacy of applying a surface measure in the computation of molecular structure, using a method of probabilistic least square computations which facilitates the introduction of multiple, noisy, heterogeneous data sources. For this purpose, we introduce a simple purely geometrical measure of surface proximity called maximal conic view (MCV). MCV is efficiently computable and differentiable, and is hence well suited to driving a structural optimization method based, in part, on surface data. As an initial validation, we show that MCV correlates well with known measures for total exposed surface area. We use this measure in our experiments to show that information about surface proximity (derived from theory or experiment, for example) can be added to a set of distance measurements to increase significantly the quality of the computed structure. In particular, when 30 to 50 percent of all possible short-range distances are provided, the addition of surface information improves the quality of the computed structure (as measured by RMS fit) by as much as 80 percent. Our results demonstrate that knowledge of which atoms are on the surface and which are buried can be used as a powerful constraint in estimating molecular structure.

从稀疏的实验约束中计算三维结构需要结合异构信息源的方法，如距离、角度和总体积、形状和表面的测量。对于某些类型的信息，例如原子之间的距离，有许多方法可用于计算满足所提供约束的结构。然而，在结构计算过程中，使用关于原子在表面或埋藏程度的信息作为有用的约束是比较困难的。表面测量已被用作先前计算结构的接受/拒绝标准，但这不是一种有效的策略。在本文中，我们研究了在分子结构计算中应用表面测度的有效性，使用概率最小二乘计算方法，该方法有助于引入多个，嘈杂的，异构的数据源。为此，我们引入了一种简单的纯粹几何的曲面接近度量，称为最大圆锥视图(MCV)。MCV是高效可计算和可微的，因此非常适合驱动部分基于地面数据的结构优化方法。作为初步验证，我们表明MCV与已知的总暴露表面积相关。我们在实验中使用这种测量方法来表明，可以将有关表面接近度的信息(例如，来自理论或实验的信息)添加到一组距离测量中，以显著提高计算结构的质量。特别是，当提供所有可能的近距离距离的30%至50%时，表面信息的添加可将计算结构的质量(通过RMS拟合测量)提高多达80%。我们的研究结果表明，哪些原子在表面上，哪些原子在地下，这些知识可以作为估计分子结构的有力约束。

{"title":"A surface measure for probabilistic structural computations.","authors":"J P Schmidt, C C Chen, J L Cooper, R B Altman","doi":"","DOIUrl":"","url":null,"abstract":"Computing three-dimensional structures from sparse experimental constraints requires method for combining heterogeneous sources of information, such as distances, angles, and measures of total volume, shape, and surface. For some types of information, such as distances between atoms, numerous methods are available for computing structures that satisfy the provided constraints. It is more difficult, however, to use information about the degree to which an atom is on the surface or buried as a useful constraint during structure computations. Surface measures have been used as accept/reject criteria for previously computed structures, but this is not an efficient strategy. In this paper, we investigate the efficacy of applying a surface measure in the computation of molecular structure, using a method of probabilistic least square computations which facilitates the introduction of multiple, noisy, heterogeneous data sources. For this purpose, we introduce a simple purely geometrical measure of surface proximity called maximal conic view (MCV). MCV is efficiently computable and differentiable, and is hence well suited to driving a structural optimization method based, in part, on surface data. As an initial validation, we show that MCV correlates well with known measures for total exposed surface area. We use this measure in our experiments to show that information about surface proximity (derived from theory or experiment, for example) can be added to a set of distance measurements to increase significantly the quality of the computed structure. In particular, when 30 to 50 percent of all possible short-range distances are provided, the addition of surface information improves the quality of the computed structure (as measured by RMS fit) by as much as 80 percent. Our results demonstrate that knowledge of which atoms are on the surface and which are buried can be used as a powerful constraint in estimating molecular structure.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":"6 ","pages":"148-56"},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of divergent functions in homologous proteins by induction over conserved modules. 同源蛋白在保守模组上的诱导鉴定。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1998-01-01

I Shah, L Hunter

Homologous proteins do not necessarily exhibit identical biochemical function. Despite this fact, local or global sequence similarity is widely used as an indication of functional identity. Of the 1327 Enzyme Commission defined functional classes with more than one annotated example in the sequence databases, similarity scores alone are inadequate in 251 (19%) of the cases. We test the hypothesis that conserved domains, as defined in the ProDom database, can be used to discriminate between alternative functions for homologous proteins in these cases. Using machine learning methods, we were able to induce correct discriminators for more than half of these 251 challenging functional classes. These results show that the combination of modular representations of proteins with sequence similarity improves the ability to infer function from sequence over similarity scores alone.

同源蛋白不一定表现出相同的生化功能。尽管如此，局部或全局序列相似性被广泛用作功能同一性的指示。在1327个酶委员会定义的功能类中，序列数据库中有一个以上的注释示例，其中251个(19%)案例的相似性评分不足。我们验证了ProDom数据库中定义的保守结构域在这些情况下可以用来区分同源蛋白的替代功能的假设。使用机器学习方法，我们能够为251个具有挑战性的功能类中的一半以上诱导出正确的鉴别器。这些结果表明，蛋白质的模块化表示与序列相似性的组合提高了从序列中推断功能的能力。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀