首页 > 最新文献

Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文 中文
BioSim--a new qualitative simulation environment for molecular biology. BioSim-一个新的分子生物学定性模拟环境。
K R Heidtke, S Schulze-Kremer

Traditionally, biochemical systems are modelled using kinetics and differential equations in a quantitative simulator. However, for many biological processes detailed quantitative information is not available, only qualitative or fuzzy statements about the nature of interactions. In a previous paper we have shown the applicability of qualitative reasoning methods for molecular biological regulatory processes. Now, we present a newly developed simulation environment, BioSim, that is written in Prolog using constraint logic programming techniques. The simulator combines the basic ideas of two main approaches to qualitative reasoning and integrates the contents of a molecular biology knowledge base, EcoCyc. We show that qualitative reasoning can be combined with automatic transformation of contents of genomic databases into simulation models to give an interactive modelling system that reasons about the relations and interactions of biological entities. This is demonstrated on the glycolytic pathway.

传统上,生化系统是在定量模拟器中使用动力学和微分方程建模的。然而,对于许多生物过程,没有详细的定量信息,只有关于相互作用性质的定性或模糊陈述。在之前的一篇论文中,我们已经展示了定性推理方法在分子生物学调控过程中的适用性。现在,我们提出了一个新开发的仿真环境,BioSim,它是用Prolog编写的,使用约束逻辑编程技术。该模拟器结合了定性推理的两种主要方法的基本思想,并集成了分子生物学知识库EcoCyc的内容。我们表明,定性推理可以与基因组数据库内容自动转换为仿真模型相结合,从而给出一个交互式建模系统,该系统可以对生物实体的关系和相互作用进行推理。这在糖酵解途径上得到证实。
{"title":"BioSim--a new qualitative simulation environment for molecular biology.","authors":"K R Heidtke,&nbsp;S Schulze-Kremer","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Traditionally, biochemical systems are modelled using kinetics and differential equations in a quantitative simulator. However, for many biological processes detailed quantitative information is not available, only qualitative or fuzzy statements about the nature of interactions. In a previous paper we have shown the applicability of qualitative reasoning methods for molecular biological regulatory processes. Now, we present a newly developed simulation environment, BioSim, that is written in Prolog using constraint logic programming techniques. The simulator combines the basic ideas of two main approaches to qualitative reasoning and integrates the contents of a molecular biology knowledge base, EcoCyc. We show that qualitative reasoning can be combined with automatic transformation of contents of genomic databases into simulation models to give an interactive modelling system that reasons about the relations and interactions of biological entities. This is demonstrated on the glycolytic pathway.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling protein homopolymeric repeats: possible polyglutamine structural motifs for Huntington's disease. 模拟蛋白同聚重复序列:亨廷顿氏病可能的聚谷氨酰胺结构基序。
R H Lathrop, M Casale, D J Tobias, J L Marsh, L M Thompson

We describe a prototype system (Poly-X) for assisting an expert user in modeling protein repeats. Poly-X reduces the large number of degrees of freedom required to specify a protein motif in complete atomic detail. The result is a small number of parameters that are easily understood by, and under the direct control of, a domain expert. The system was applied to the polyglutamine (poly-Q) repeat in the first exon of huntingtin, the gene implicated in Huntington's disease. We present four poly-Q structural motifs: two poly-Q beta-sheet motifs (parallel and antiparallel) that constitute plausible alternatives to a similar previously published poly-Q beta-sheet motif, and two novel poly-Q helix motifs (alpha-helix and pi-helix). To our knowledge, helical forms of polyglutamine have not been proposed before. The motifs suggest that there may be several plausible aggregation structures for the intranuclear inclusion bodies which have been found in diseased neurons, and may help in the effort to understand the structural basis for Huntington's disease.

我们描述了一个原型系统(Poly-X),用于帮助专家用户建模蛋白质重复。Poly-X减少了在完整的原子细节中指定蛋白质基序所需的大量自由度。其结果是少量的参数易于理解,并在领域专家的直接控制下。该系统被应用于亨廷顿舞蹈病基因亨廷顿蛋白第一外显子的聚谷氨酰胺(poly-Q)重复序列。我们提出了四个多q结构基序:两个多q β -sheet基序(平行和反平行),它们构成了类似先前发表的多q β -sheet基序的合理替代品,以及两个新的多q螺旋基序(α -螺旋和pi-螺旋)。据我们所知,螺旋形式的聚谷氨酰胺以前还没有被提出过。这些基序表明,在病变神经元中发现的核内包涵体可能存在几种可能的聚集结构,并可能有助于理解亨廷顿氏病的结构基础。
{"title":"Modeling protein homopolymeric repeats: possible polyglutamine structural motifs for Huntington's disease.","authors":"R H Lathrop,&nbsp;M Casale,&nbsp;D J Tobias,&nbsp;J L Marsh,&nbsp;L M Thompson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We describe a prototype system (Poly-X) for assisting an expert user in modeling protein repeats. Poly-X reduces the large number of degrees of freedom required to specify a protein motif in complete atomic detail. The result is a small number of parameters that are easily understood by, and under the direct control of, a domain expert. The system was applied to the polyglutamine (poly-Q) repeat in the first exon of huntingtin, the gene implicated in Huntington's disease. We present four poly-Q structural motifs: two poly-Q beta-sheet motifs (parallel and antiparallel) that constitute plausible alternatives to a similar previously published poly-Q beta-sheet motif, and two novel poly-Q helix motifs (alpha-helix and pi-helix). To our knowledge, helical forms of polyglutamine have not been proposed before. The motifs suggest that there may be several plausible aggregation structures for the intranuclear inclusion bodies which have been found in diseased neurons, and may help in the effort to understand the structural basis for Huntington's disease.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced query mechanisms for biological databases. 生物数据库的高级查询机制。
I M Chen, A S Kosky, V M Markowitz, E Szeto, T Topaloglou

Existing query interfaces for biological databases are either based on fixed forms or textual query languages. Users of a fixed form-based query interface are limited to performing some pre-defined queries providing a fixed view of the underlying database, while users of a free text query language-based interface have to understand the underlying data models, specific query languages and application schemas in order to formulate queries. Further, operations on application-specific complex data (e.g., DNA sequences, proteins), which are usually provided by a variety of software packages with their own format requirements and peculiarities, are not available as part of, nor integrated with biological query interfaces. In this paper, we describe generic tools that provide powerful and flexible support for interactively exploring biological databases in a uniform and consistent way, that is via common data models, formats, and notations, in the framework of the Object-Protocol Model (OPM). These tools include (i) a Java graphical query construction tool with support for automatic generation of Web query forms that can be either used for further specifying conditions, or can be saved and customized; (ii) query processors for interpreting and executing queries that may involve complex application-specific objects, and that could span multiple heterogeneous databases and file systems; and (iii) utilities for automatic generation of HTML pages containing query results, that can be browsed using a Web browser. These tools avoid the restrictions imposed by traditional fixed-form query interfaces, while providing users with simple and intuitive facilities for formulating ad-hoc queries across heterogeneous databases, without the need to understand the underlying data models and query languages.

现有的生物数据库查询接口要么基于固定形式,要么基于文本查询语言。固定的基于表单的查询接口的用户仅限于执行一些预定义的查询,这些查询提供了底层数据库的固定视图,而基于自由文本查询语言的接口的用户必须理解底层数据模型、特定的查询语言和应用程序模式,以便制定查询。此外,对特定于应用程序的复杂数据(例如,DNA序列、蛋白质)的操作通常由各种软件包提供,这些软件包具有自己的格式要求和特性,不能作为生物查询接口的一部分使用,也不能与生物查询接口集成。在本文中,我们描述了在对象协议模型(OPM)框架中,通过通用数据模型、格式和符号,以统一和一致的方式为交互式探索生物数据库提供强大而灵活支持的通用工具。这些工具包括(i)支持自动生成Web查询表单的Java图形查询构造工具,这些表单既可以用于进一步指定条件,也可以保存和自定义;(ii)用于解释和执行查询的查询处理器,这些查询可能涉及复杂的特定于应用程序的对象,并且可能跨越多个异构数据库和文件系统;以及(iii)用于自动生成包含查询结果的HTML页面的实用程序,这些页面可以使用Web浏览器浏览。这些工具避免了传统的固定形式查询接口所带来的限制,同时为用户提供了简单直观的工具,用于跨异构数据库制定特别查询,而无需了解底层数据模型和查询语言。
{"title":"Advanced query mechanisms for biological databases.","authors":"I M Chen,&nbsp;A S Kosky,&nbsp;V M Markowitz,&nbsp;E Szeto,&nbsp;T Topaloglou","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Existing query interfaces for biological databases are either based on fixed forms or textual query languages. Users of a fixed form-based query interface are limited to performing some pre-defined queries providing a fixed view of the underlying database, while users of a free text query language-based interface have to understand the underlying data models, specific query languages and application schemas in order to formulate queries. Further, operations on application-specific complex data (e.g., DNA sequences, proteins), which are usually provided by a variety of software packages with their own format requirements and peculiarities, are not available as part of, nor integrated with biological query interfaces. In this paper, we describe generic tools that provide powerful and flexible support for interactively exploring biological databases in a uniform and consistent way, that is via common data models, formats, and notations, in the framework of the Object-Protocol Model (OPM). These tools include (i) a Java graphical query construction tool with support for automatic generation of Web query forms that can be either used for further specifying conditions, or can be saved and customized; (ii) query processors for interpreting and executing queries that may involve complex application-specific objects, and that could span multiple heterogeneous databases and file systems; and (iii) utilities for automatic generation of HTML pages containing query results, that can be browsed using a Web browser. These tools avoid the restrictions imposed by traditional fixed-form query interfaces, while providing users with simple and intuitive facilities for formulating ad-hoc queries across heterogeneous databases, without the need to understand the underlying data models and query languages.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20695515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequence assembly validation by multiple restriction digest fragment coverage analysis. 基于多约束摘要片段覆盖率分析的序列装配验证。
E C Rouchka, D J States

DNA sequence analysis depends on the accurate assembly of fragment reads for the determination of a consensus sequence. This report examines the possibility of analyzing multiple, independent restriction digests as a method for testing the fidelity of sequence assembly. A dynamic programming algorithm to determine the maximum likelihood alignment of error prone electrophoretic mobility data to the expected fragment mobilities given the consensus sequence and restriction enzymes is derived and used to assess the likelihood of detecting rearrangements in genomic sequencing projects. The method is shown to reliably detect errors in sequence fragment assembly without the necessity of making reference to an overlying physical map. An html form-based interface is available at http:/(/)www.ibc.wustl.edu/services/validate. html.

DNA序列分析依赖于片段reads的准确组装以确定一致序列。本报告探讨了分析多个独立的限制性酶切作为测试序列组装保真度的方法的可能性。一种动态规划算法,用于确定在给定共识序列和限制性内切酶的情况下,容易出错的电泳迁移率数据与预期片段迁移率的最大可能性对齐,并用于评估基因组测序项目中检测重排的可能性。结果表明,该方法可以可靠地检测序列片段组装中的错误,而不需要参考覆盖的物理映射。一个基于html表单的界面可以在http:/(/)www.ibc.wustl.edu/services/validate上找到。超文本标记语言
{"title":"Sequence assembly validation by multiple restriction digest fragment coverage analysis.","authors":"E C Rouchka,&nbsp;D J States","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>DNA sequence analysis depends on the accurate assembly of fragment reads for the determination of a consensus sequence. This report examines the possibility of analyzing multiple, independent restriction digests as a method for testing the fidelity of sequence assembly. A dynamic programming algorithm to determine the maximum likelihood alignment of error prone electrophoretic mobility data to the expected fragment mobilities given the consensus sequence and restriction enzymes is derived and used to assess the likelihood of detecting rearrangements in genomic sequencing projects. The method is shown to reliably detect errors in sequence fragment assembly without the necessity of making reference to an overlying physical map. An html form-based interface is available at http:/(/)www.ibc.wustl.edu/services/validate. html.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMGT/LIGM-DB: a systematized approach for ImMunoGeneTics database coherence and data distribution improvement. IMGT/LIGM-DB:免疫遗传学数据库一致性和数据分布改进的系统化方法。
V Giudicelli, D Chaume, M P Lefranc

IMGT, the international ImMunoGeneTics database (http:(/)/imgt.cnusc.fr:8104), created by Marie-Paule Lefranc, Montpellier, France, is an integrated database specializing in antigen receptors and MHC of all vertebrate species. IMGT includes LIGM-DB, developed for Immunoglobulins and T-cell-receptors. LIGM-DB distributes high quality data with an important increment value added by the LIGM expert annotations. LIGM-DB accurate immunogenetics data is based on the standardization of biological knowledge related to keywords, annotation labels and gene identification. The management of such data resulting from biological research requires an high flexible implementation to quickly reflect up-to-date results, and to integrate new knowledge. We developed a systematized approach and defined LIGM-DB systems which manage and realize the major tasks for the database survey. In this paper, we will focus on the coherence system, which became absolutely crucial to maintain data quality as the database is growing up and as the biological knowledge continues to improve, and on the distribution system which makes LIGM-DB data easy to access, download and reuse. Efforts have been done to improve the data distribution procedures and adapt them to the current bioinformatics needs. Recently, we have developed an API which allows Java programmers to remotely access and integrate LIGM-DB data in other computer environments.

IMGT,国际免疫遗传学数据库(http://imgt.cnusc.fr:8104),由法国蒙彼利埃的Marie-Paule Lefranc创建,是一个专门研究所有脊椎动物物种抗原受体和MHC的综合数据库。IMGT包括针对免疫球蛋白和t细胞受体开发的LIGM-DB。LIGM- db分发高质量的数据,并通过LIGM专家注释增加了重要的增量值。LIGM-DB精确的免疫遗传学数据是基于标准化的生物学知识,涉及关键词、注释标签和基因鉴定。对这些来自生物学研究的数据的管理需要高度灵活的实施,以快速反映最新的结果,并整合新的知识。我们开发了一种系统化的方法,并定义了LIGM-DB系统来管理和实现数据库调查的主要任务。在本文中,我们将重点关注相干系统,随着数据库的发展和生物知识的不断提高,相干系统对保持数据质量至关重要,以及使LIGM-DB数据易于访问、下载和重用的分发系统。已经努力改进数据分发程序,使其适应当前的生物信息学需要。最近,我们开发了一个API,允许Java程序员远程访问和集成其他计算机环境中的LIGM-DB数据。
{"title":"IMGT/LIGM-DB: a systematized approach for ImMunoGeneTics database coherence and data distribution improvement.","authors":"V Giudicelli,&nbsp;D Chaume,&nbsp;M P Lefranc","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>IMGT, the international ImMunoGeneTics database (http:(/)/imgt.cnusc.fr:8104), created by Marie-Paule Lefranc, Montpellier, France, is an integrated database specializing in antigen receptors and MHC of all vertebrate species. IMGT includes LIGM-DB, developed for Immunoglobulins and T-cell-receptors. LIGM-DB distributes high quality data with an important increment value added by the LIGM expert annotations. LIGM-DB accurate immunogenetics data is based on the standardization of biological knowledge related to keywords, annotation labels and gene identification. The management of such data resulting from biological research requires an high flexible implementation to quickly reflect up-to-date results, and to integrate new knowledge. We developed a systematized approach and defined LIGM-DB systems which manage and realize the major tasks for the database survey. In this paper, we will focus on the coherence system, which became absolutely crucial to maintain data quality as the database is growing up and as the biological knowledge continues to improve, and on the distribution system which makes LIGM-DB data easy to access, download and reuse. Efforts have been done to improve the data distribution procedures and adapt them to the current bioinformatics needs. Recently, we have developed an API which allows Java programmers to remotely access and integrate LIGM-DB data in other computer environments.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of signal peptides and signal anchors by a hidden Markov model. 隐马尔可夫模型预测信号肽和信号锚点。
H Nielsen, A Krogh

A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our data, the length distributions for the three regions are significantly different from expectations. For instance, the assigned hydrophobic region is between 8 and 12 residues long in almost all eukaryotic signal peptides. This analysis also makes obvious the difference between eukaryotes, Gram-positive bacteria, and Gram-negative bacteria. The model can be used to predict the location of the cleavage site, which it finds correctly in nearly 70% of signal peptides in a cross-validated test--almost the same accuracy as the best previous method. One of the problems for existing prediction methods is the poor discrimination between signal peptides and uncleaved signal anchors, but this is substantially improved by the hidden Markov model when expanding it with a very simple signal anchor model.

建立了信号肽的隐马尔可夫模型。它包含n端部分、疏水区域和解理位点周围区域的子模型。对于已知的信号肽,该模型可用于在这三个区域之间分配客观边界。应用于我们的数据,三个地区的长度分布与预期显著不同。例如,在几乎所有真核生物的信号肽中,指定的疏水区域长度在8到12个残基之间。这一分析也表明真核生物、革兰氏阳性菌和革兰氏阴性菌之间存在明显的差异。该模型可用于预测切割位点的位置,在交叉验证测试中,它在近70%的信号肽中找到了正确的位置-几乎与之前最好的方法相同的准确性。现有预测方法的问题之一是信号肽和未裂解信号锚点之间的区别较差,但当将隐马尔可夫模型扩展到一个非常简单的信号锚点模型时,这一问题得到了极大的改善。
{"title":"Prediction of signal peptides and signal anchors by a hidden Markov model.","authors":"H Nielsen,&nbsp;A Krogh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our data, the length distributions for the three regions are significantly different from expectations. For instance, the assigned hydrophobic region is between 8 and 12 residues long in almost all eukaryotic signal peptides. This analysis also makes obvious the difference between eukaryotes, Gram-positive bacteria, and Gram-negative bacteria. The model can be used to predict the location of the cleavage site, which it finds correctly in nearly 70% of signal peptides in a cross-validated test--almost the same accuracy as the best previous method. One of the problems for existing prediction methods is the poor discrimination between signal peptides and uncleaved signal anchors, but this is substantially improved by the hidden Markov model when expanding it with a very simple signal anchor model.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A computational system for modelling flexible protein-protein and protein-DNA docking. 柔性蛋白质-蛋白质和蛋白质- dna对接建模的计算系统。
M J Sternberg, P Aloy, H A Gabb, R M Jackson, G Moont, E Querol, F X Aviles

A computational system is described that predicts the structure of protein/protein and protein/DNA complexes starting from unbound coordinate sets. The approach is (i) a global search with rigid-body docking for complexes with shape complementarity and favourable electrostatics; (ii) use of distance constraints from experimental (or predicted) knowledge of critical residues; (iii) use of pair potential to screen docked complexes and (iv) refinement and further screening by protein-side chain optimisation and interfacial energy minimisation. The system has been applied to model ten protein/protein and eight protein-repressor/DNA (steps i to iii only) complexes. In general a few complexes, one of which is close to the true structure, can be generated.

描述了一种从未结合坐标集开始预测蛋白质/蛋白质和蛋白质/DNA复合物结构的计算系统。该方法是(1)对形状互补且具有良好静电的配合物进行刚体对接的全局搜索;(ii)使用来自实验(或预测)临界残差知识的距离约束;(iii)利用对电位筛选对接的配合物;(iv)通过蛋白侧链优化和界面能最小化来提纯和进一步筛选。该系统已被应用于模拟10个蛋白质/蛋白质和8个蛋白质-抑制因子/DNA(步骤i至iii)复合物。一般来说,可以生成几个配合物,其中一个接近真实结构。
{"title":"A computational system for modelling flexible protein-protein and protein-DNA docking.","authors":"M J Sternberg,&nbsp;P Aloy,&nbsp;H A Gabb,&nbsp;R M Jackson,&nbsp;G Moont,&nbsp;E Querol,&nbsp;F X Aviles","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A computational system is described that predicts the structure of protein/protein and protein/DNA complexes starting from unbound coordinate sets. The approach is (i) a global search with rigid-body docking for complexes with shape complementarity and favourable electrostatics; (ii) use of distance constraints from experimental (or predicted) knowledge of critical residues; (iii) use of pair potential to screen docked complexes and (iv) refinement and further screening by protein-side chain optimisation and interfacial energy minimisation. The system has been applied to model ten protein/protein and eight protein-repressor/DNA (steps i to iii only) complexes. In general a few complexes, one of which is close to the true structure, can be generated.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated clustering and assembly of large EST collections. 大型EST集合的自动集群和组装。
D P Yee, D Conklin

The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recursive EST eXtender) algorithm described in this paper completely automates this process by finding ESTs that can be clustered on the basis of overlapping bases, and then assembling the contigs into a consensus sequence. By combining the clustering and assembly steps, REX can quickly generate assemblies from EST databases that are frequently updated without having to preprocess the data. A consensus assembly method is used to correct miscalled bases and remove indel errors. A unique feature of this method is that it addresses the issues of splice variants and unspliced cDNA data. Since REX is a fast greedy algorithm, it can address the problem of generating a database of assembled sequences from very large collections of EST data. A procedure is described for creating and maintaining an Assembled Consensus EST database (ACE) that is useful for characterizing the large body of data that exists in EST databases.

大型EST(表达序列标签)数据库的可用性导致了新基因克隆方式的革命。但是,由于原始EST数据的高错误率和冗余,出现了困难。由于这些原因,科学家在研究任何感兴趣的EST时,首先要做的任务之一就是收集连续的EST,并将它们组装成一个更大的虚拟cDNA。本文描述的REX (Recursive EST eXtender,递归EST eXtender)算法完全自动化了这一过程,通过在重叠碱基的基础上找到可聚类的EST,然后将其组装成一致序列。通过组合集群和组装步骤,REX可以从EST数据库快速生成经常更新的程序集,而无需预处理数据。一致性装配方法用于纠正错误的基和消除indel误差。该方法的一个独特之处在于它解决了剪接变异体和未剪接cDNA数据的问题。由于REX是一种快速贪婪算法,它可以解决从非常大的EST数据集合生成组装序列数据库的问题。描述了一个用于创建和维护汇编共识EST数据库(ACE)的过程,该过程对于描述EST数据库中存在的大量数据体非常有用。
{"title":"Automated clustering and assembly of large EST collections.","authors":"D P Yee,&nbsp;D Conklin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The availability of large EST (Expressed Sequence Tag) databases has led to a revolution in the way new genes are cloned. Difficulties arise, however, due to high error rates and redundancy of raw EST data. For these reasons, one of the first tasks performed by a scientist investigating any EST of interest is to gather contiguous ESTs and assemble them into a larger virtual cDNA. The REX (Recursive EST eXtender) algorithm described in this paper completely automates this process by finding ESTs that can be clustered on the basis of overlapping bases, and then assembling the contigs into a consensus sequence. By combining the clustering and assembly steps, REX can quickly generate assemblies from EST databases that are frequently updated without having to preprocess the data. A consensus assembly method is used to correct miscalled bases and remove indel errors. A unique feature of this method is that it addresses the issues of splice variants and unspliced cDNA data. Since REX is a fast greedy algorithm, it can address the problem of generating a database of assembled sequences from very large collections of EST data. A procedure is described for creating and maintaining an Assembled Consensus EST database (ACE) that is useful for characterizing the large body of data that exists in EST databases.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20695285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A map of the protein space--an automatic hierarchical classification of all protein sequences. 蛋白质空间的地图——所有蛋白质序列的自动分层分类。
G Yona, N Linial, N Tishby, M Linial

We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as well as many interesting relations between protein families. The hierarchical organization proposed may be considered as the first map of the space of all protein sequences. An interactive web site including the results of our analysis has been constructed, and is now accessible through http:/(/)www.protomap.cs.huji.ac.il

我们研究了所有蛋白质序列的空间。我们结合了标准的相似性度量(SW, FASTA, BLAST),将每个序列与邻近序列的详尽列表相关联。这些列表产生一个(加权有向)图,其顶点是序列。连接两个序列的边的权值表示它们的相似度。这个图编码了序列空间的许多基本属性。我们在这张图中寻找相关蛋白质的簇。这些聚类对应于强连接的顶点集。我们的工作基于两个主要思想:1)通过传递性可以推断出蛋白质之间有趣的同源性。ii)传递性应严格应用,以防止不相关的蛋白质聚集在一起。我们的分析从一个非常保守的分类开始,基于非常显著的相似性,它有很多类。随后,合并类以包含不太重要的相似性。合并是通过一种新的两阶段算法来完成的。首先,该算法使用局部考虑识别可能相关的集群组(基于传递性和强连通性),并合并它们。然后,应用全局测试来识别这些群集组中强关系的核心,并相应地改进分类。这个过程发生在不同的统计显著性阈值上,在每一步中,算法应用于前一个分类的类,以获得下一个分类,在更允许的阈值上。因此,得到了所有蛋白质的层次结构。由此产生的分类将所有蛋白质序列的空间划分为定义良好的蛋白质组。结果表明,自动诱导的蛋白质组与天然生物家族和超家族密切相关。这种层次结构揭示了组成已知蛋白质家族的更精细的亚家族,以及蛋白质家族之间许多有趣的关系。所提出的层次结构可以被认为是所有蛋白质序列空间的第一张地图。一个包含我们分析结果的交互式网站已经建立,现在可以通过http:/(/)www.protomap.cs.huji.ac.il访问
{"title":"A map of the protein space--an automatic hierarchical classification of all protein sequences.","authors":"G Yona,&nbsp;N Linial,&nbsp;N Tishby,&nbsp;M Linial","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as well as many interesting relations between protein families. The hierarchical organization proposed may be considered as the first map of the space of all protein sequences. An interactive web site including the results of our analysis has been constructed, and is now accessible through http:/(/)www.protomap.cs.huji.ac.il</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20695286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical minimization with distance and angle constraints. 具有距离和角度约束的分层最小化。
J R Gunn

The incorporation of experimentally-determined constraints into structure-prediction methods based on energy minimization leads to both improved selectivity with empirical potential functions and structure determination with far fewer constraints than are required for distance-geometry calculations. Some methods will be described for using both distance and angle constraints with the hierarchical minimization algorithm. The simulation is based on a combination of Monte Carlo Simulated Annealing and Genetic Algorithm techniques which are integrated into a single framework. The selection cycle of the genetic algorithm is carried out at the same temperature as the mutations, or alternatively the crossover cycle can be considered as a type of Monte Carlo trial move, such that each temperature annealing step corresponds to a new generation. The sequence is divided up into segments, and the mutation step consists of replacing an entire segment with a choice from a pre-selected list. This list is in turn constructed from a list of smaller segments, and the number of overall conformations can thus be pruned at each level of selection. Results will be shown for test cases using a small number of flexible distance constraints used as an additional term in the potential, and for restrictions placed on backbone dihedral angles used as an additional screening criterion for constructing trial moves.

将实验确定的约束结合到基于能量最小化的结构预测方法中,既提高了经验势函数的选择性,又大大减少了距离几何计算所需的约束。一些方法将描述使用距离和角度约束与层次最小化算法。该仿真是基于蒙特卡罗模拟退火和遗传算法技术的组合,这些技术集成到一个单一的框架中。遗传算法的选择周期在与突变相同的温度下进行,或者可以将交叉周期视为一种蒙特卡罗试验移动,使得每个温度退火步骤对应一个新的一代。该序列被分成几个片段,突变步骤包括用预先选择的列表中的选择替换整个片段。该列表依次由较小的片段列表构建,因此可以在每个选择级别上修剪总体构象的数量。测试用例的结果将显示为使用少量灵活的距离约束作为势的附加项,以及作为构造试验动作的附加筛选标准的对骨干二面角的限制。
{"title":"Hierarchical minimization with distance and angle constraints.","authors":"J R Gunn","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The incorporation of experimentally-determined constraints into structure-prediction methods based on energy minimization leads to both improved selectivity with empirical potential functions and structure determination with far fewer constraints than are required for distance-geometry calculations. Some methods will be described for using both distance and angle constraints with the hierarchical minimization algorithm. The simulation is based on a combination of Monte Carlo Simulated Annealing and Genetic Algorithm techniques which are integrated into a single framework. The selection cycle of the genetic algorithm is carried out at the same temperature as the mutations, or alternatively the crossover cycle can be considered as a type of Monte Carlo trial move, such that each temperature annealing step corresponds to a new generation. The sequence is divided up into segments, and the mutation step consists of replacing an entire segment with a choice from a pre-selected list. This list is in turn constructed from a list of smaller segments, and the number of overall conformations can thus be pruned at each level of selection. Results will be shown for test cases using a small number of flexible distance constraints used as an additional term in the potential, and for restrictions placed on backbone dihedral angles used as an additional screening criterion for constructing trial moves.</p>","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20696076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Conference on Intelligent Systems for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1