Proceedings. International Conference on Intelligent Systems for Molecular Biology最新文献

英文中文

A data base of minimally frustrated alpha helical segments extracted from proteins according to an entropy criterion. 根据熵准则从蛋白质中提取最小受挫α螺旋片段的数据库。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

R Casadio, M Compiani, P Fariselli, P L Martelli

A data base of minimally frustrated alpha helical segments is defined by filtering a set comprising 822 non redundant proteins, which contain 4783 alpha helical structures. The data base definition is performed using a neural network-based alpha helix predictor, whose outputs are rated according to an entropy criterion. A comparison with the presently available experimental results indicates that a subset of the data base contains the initiation sites of protein folding experimentally detected and also protein fragments which fold into stable isolated alpha helices. This suggests the usage of the data base (and/or of the predictor) to highlight patterns which govern the stability of alpha helices in proteins and the helical behavior of isolated protein fragments.

通过筛选包含822个非冗余蛋白，其中包含4783个α螺旋结构，定义了最小受挫α螺旋片段数据库。数据库定义使用基于神经网络的alpha helix预测器执行，其输出根据熵标准进行评级。与现有实验结果的比较表明，数据库的一个子集包含了实验检测到的蛋白质折叠起始位点和折叠成稳定的分离α螺旋的蛋白质片段。这表明使用数据库(和/或预测器)来突出控制蛋白质中α螺旋的稳定性和分离蛋白质片段的螺旋行为的模式。

引用次数: 0

An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. 寻找序列中短基序的精确方法，并应用于核糖体结合位点问题。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

M Tompa

This is an investigation of methods for finding short motifs that only occur in a fraction of the input sequences. Unlike local search techniques that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores. This method is illustrated for the Ribosome Binding Site Problem, which is to identify the short mRNA 5' untranslated sequence that is recognized by the ribosome during initiation of protein synthesis. Experiments were performed to solve this problem for each of fourteen sequenced prokaryotes, by applying the method to the full complement of genes from each. One of the interesting results of this experimentation is evidence that the recognized sequence of the thermophilic archaea A. fulgidus, M. jannaschii, M. thermoautotrophicum, and P. horikoshii may be somewhat different than the well known Shine-Dalgarno sequence.

这是一项调查的方法，以寻找短基序，只出现在一小部分的输入序列。与局部搜索技术可能无法达到全局最优不同，本文提出的方法保证产生具有最大z分数的图案。该方法用于解决核糖体结合位点问题，即识别在蛋白质合成起始阶段被核糖体识别的短mRNA 5'未翻译序列。通过将该方法应用于每个原核生物的完整基因补体，对14个测序的原核生物中的每一个进行了实验，以解决这个问题。该实验的一个有趣结果是，已知的嗜热古细菌A. fulgidus、M. jannaschii、M. thermoautotrophicum和P. horikoshii的序列可能与众所周知的Shine-Dalgarno序列有所不同。

引用次数: 0

Position-specific annotation of protein function based on multiple homologs. 基于多个同源物的蛋白质功能的位置特异性注释。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

M A Andrade

I present in this work an algorithm for deriving protein functional annotations which are position-specific. The input is based on the results of a sequence similarity search of the query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and the amino acid conservation is used to compute a score that indicates which descriptor is likely to describe better the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated to different function. Different levels of functional specificity can be compared, allowing to choose the one that suits better the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherence check.

在这项工作中，我提出了一种用于导出位置特异性蛋白质功能注释的算法。输入基于对序列数据库的查询序列的序列相似性搜索的结果。从蛋白质的描述中提取单词串，并使用具有相同描述符的蛋白质与氨基酸守恒之间的相关性来计算分数，该分数表明哪个描述符可能更好地描述每个特定残基的功能。分析分数曲线和比较不同的功能，可以很容易地检测到与不同功能相关的部分序列。可以比较不同水平的功能特异性，从而选择更适合蛋白质功能的蛋白。该算法的直接应用是支持(自动化)蛋白质功能注释方法和数据库一致性检查。

引用次数: 0

Fidelity probes for DNA arrays. 用于DNA阵列的保真度探针。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

E Hubbell, P A Pevzner

One current approach to quality control in DNA array manufacturing is to synthesize a small set of test probes that detect variation in the manufacturing process. These fidelity probes consist of identical copies of the same probe, but they are deliberately manufactured using different steps of the manufacturing process. A known target is hybridized to these probes, and those hybridization results are indicative of the quality of the manufacturing process. It is not only desirable to detect variations, but also to analyze the variations that occur, indicating in what process step the manufacture changed. We describe a combinatorial approach which constructs a small set of fidelity probes that not only detect variations, but also point out the manufacturing step in which a variation has occurred. This algorithm is currently being used in mass-production of DNA arrays at Affyetrix.

目前DNA阵列制造质量控制的一种方法是合成一组检测制造过程中变化的测试探针。这些保真度探头由相同探头的相同副本组成，但它们是故意使用制造过程的不同步骤制造的。已知目标与这些探针杂交，这些杂交结果表明制造过程的质量。不仅需要检测变化，而且需要分析发生的变化，表明在哪个工艺步骤中制造发生了变化。我们描述了一种组合方法，该方法构建了一组小的保真度探头，不仅可以检测变化，还可以指出发生变化的制造步骤。该算法目前正在Affyetrix的DNA阵列的批量生产中使用。

引用次数: 0

Database screening for HIV protease ligands: the influence of binding-site conformation and representation on ligand selectivity. HIV蛋白酶配体的数据库筛选:结合位点构象和表达对配体选择性的影响。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

V Schnecke, L A Kuhn

Screening for potential ligands and docking them into the binding sites of proteins is one of the main tasks in computer-aided drug design. Despite the progress in computational power, it remains infeasible to model all the factors involved in molecular recognition, especially when screening databases of more than 100,000 compounds. While ligand flexibility is considered in most approaches, the model of the binding site is rather simplistic, with neither solvation nor induced complementary usually taken into consideration. We present results for screening different databases for HIV-1 protease ligands with our tool Slide, and investigate the extent to which binding-site conformation, solvation, and template representation generate bias. The results suggest a strategy for selecting the optimal binding-site conformation, for cases in which more than one independent structure is available, and selecting a representation of that binding site that yields reproducible results and the identification of known ligands.

筛选潜在的配体并将其对接到蛋白质的结合位点是计算机辅助药物设计的主要任务之一。尽管计算能力有了进步，但对分子识别中涉及的所有因素进行建模仍然是不可行的，尤其是在筛选超过10万种化合物的数据库时。虽然大多数方法都考虑了配体的灵活性，但结合位点的模型过于简单，通常不考虑溶剂化或诱导互补。我们展示了使用Slide工具筛选HIV-1蛋白酶配体的不同数据库的结果，并调查了结合位点构象、溶剂化和模板表示在多大程度上产生偏差。研究结果提出了一种选择最佳结合位点构象的策略，在有多个独立结构可用的情况下，选择一个结合位点的表示，产生可重复的结果和已知配体的鉴定。

引用次数: 0

An algorithm combining discrete and continuous methods for optical mapping. 一种结合离散和连续方法的光学映射算法。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

R M Karp, I Pe'er, R Shamir

Optical mapping is a novel technique for generating the restriction map of a DNA molecule by observing many single, partially digested, copies of it, using fluorescence microscopy. The real-life problem is complicated by numerous factors: false positive and false negative cut observations, inaccurate location measurements, unknown orientations and faulty molecules. We present an algorithm for solving the real-life problem. The algorithm combines continuous optimization and combinatorial algorithms, applied to a non-uniform discretization of the data. We present encouraging results on real experimental data.

光学作图是一种新的技术，通过荧光显微镜观察DNA分子的许多单一的，部分消化的拷贝来产生DNA分子的限制图谱。现实生活中的问题由于许多因素而变得复杂:假阳性和假阴性切割观察，不准确的位置测量，未知的方向和错误的分子。我们提出了一种解决现实问题的算法。该算法将连续优化算法与组合算法相结合，应用于数据的非均匀离散化。我们在实际实验数据上给出了令人鼓舞的结果。

引用次数: 0

Spatio-temporal registration of the expression patterns of Drosophila segmentation genes. 果蝇片段基因表达模式的时空登记。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

E M Myasnikova, D Kosman, J Reinitz, M G Samsonova

The application of image registration techniques resulted in the construction of an integrated atlas of Drosophila segmentation gene expression in both space and time. The registration method was based on a quadratic spline approximation with flexible knots. A classifier for automatic attribution of an embryo to one of the temporal classes according to its gene expression pattern was developed.)

应用图像配准技术，构建了果蝇分割基因在空间和时间上的完整图谱。该配准方法基于带柔性节的二次样条近似。开发了一种根据基因表达模式自动将胚胎归为时间类的分类器。

引用次数: 0

Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes. 通过挖掘17个古细菌和细菌基因组的未对齐1D序列，建立1D和3D基序字典。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

I Rigoutsos, Y Gao, A Floratos, L Parida

We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

我们使用Teiresias算法在数据库中进行无监督模式发现，该数据库包含来自17个公开的完整古细菌和细菌基因组的未对齐orf，并构建了一个一维基序字典。这些基序，我们称之为小序列，在氨基酸位置水平上占97.88%的基因组输入。该1D字典中的每一个小片段都位于蛋白质数据库38.0版本的序列中，每个小片段实例对应的结构片段被识别并在三维空间上对齐，导致RMSD误差低于预先选择的阈值2.5埃的小片段被输入到结构保守小片段的三维字典中。这两个词典可以被认为是交叉索引，有助于处理诸如基因组序列的自动功能注释、局部同源性鉴定、局部结构表征、比较基因组学等任务。

引用次数: 0

Using sequence motifs for enhanced neural network prediction of protein distance constraints. 利用序列基序增强神经网络预测蛋白质距离约束。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

J Gorodkin, O Lund, C A Andersen, S Brunak

Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.

研究了多肽链中任何一对氨基酸的序列分离(残基)和距离(埃)之间的关系。对于每个序列分离，我们定义一个距离阈值。对于C α原子之间的距离小于阈值的氨基酸对，发现一个特征序列(标志)基序。基序随着序列分离的增加而变化:对于小的分离，它们由位于两个残基之间的一个峰组成，然后在这些残基上出现额外的峰，最后中心峰在非常大的分离中消失。我们还发现了基序中心残基之间的相关性。这和其他统计分析用于设计与早期工作相比性能增强的神经网络。重要的是，统计分析解释了为什么神经网络比简单的统计数据驱动方法(如对概率密度函数)表现得更好。统计结果还解释了增加序列分离时网络性能的特征。在10-30个残基的序列分离范围内，新网络设计的改进是显著的。最后，我们发现增加序列分离的性能曲线与相应的信息含量直接相关。一个WWW服务器，distanceP，可以在http://www.cbs.dtu.dk/services/distanceP/上找到。

{"title":"Using sequence motifs for enhanced neural network prediction of protein distance constraints.","authors":"J Gorodkin, O Lund, C A Andersen, S Brunak","doi":"","DOIUrl":"","url":null,"abstract":"Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analysis are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10-30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://www.cbs.dtu.dk/services/distanceP/.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":" ","pages":"95-105"},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TEXTAL: a pattern recognition system for interpreting electron density maps. 用于解释电子密度图的模式识别系统。

Proceedings. International Conference on Intelligent Systems for Molecular Biology

Pub Date : 1999-01-01

T R Ioerger, T Holton, J A Christopher, J C Sacchettini

X-ray crystallography is the most widely used method for determining the three-dimensional structures of proteins and other macromolecules. One of the most difficult steps in crystallography is interpreting the electron density map to build the final model. This is often done manually by crystallographers and is very time-consuming and error-prone. In this paper, we introduce a new automated system called TEXTAL for interpreting electron density maps using pattern recognition. Given a map to be modeled, TEXTAL divides the map into small regions and then finds regions with a similar pattern of density in a database of maps for proteins whose structures have already been solved. When a match is found, the coordinates of atoms in the region are inferred by analogy. The key to making the database lookup efficient is to extract numeric features that represent the patterns in each region and to compare feature values using a weighted Euclidean distance metric. It is crucial that the features be rotation-invariant, since regions with similar patterns of density can be oriented in any arbitrary way. This pattern-recognition approach can take advantage of data accumulated in large crystallographic databases to effectively learn the association between electron density and molecular structure by example.

x射线晶体学是测定蛋白质和其他大分子三维结构最广泛使用的方法。晶体学中最困难的步骤之一是解释电子密度图以建立最终模型。这通常是由晶体学家手工完成的，非常耗时且容易出错。在本文中，我们介绍了一种新的自动化系统，称为TEXTAL，用于使用模式识别来解释电子密度图。给定要建模的地图，TEXTAL将地图划分为小区域，然后在已经解决结构的蛋白质地图数据库中找到具有相似密度模式的区域。当找到匹配时，通过类比推断区域内原子的坐标。提高数据库查找效率的关键是提取代表每个区域模式的数字特征，并使用加权欧几里得距离度量来比较特征值。至关重要的是，这些特征是旋转不变的，因为具有相似密度模式的区域可以以任意方式定向。这种模式识别方法可以利用大型晶体数据库中积累的数据，通过实例有效地学习电子密度与分子结构之间的关系。

{"title":"TEXTAL: a pattern recognition system for interpreting electron density maps.","authors":"T R Ioerger, T Holton, J A Christopher, J C Sacchettini","doi":"","DOIUrl":"","url":null,"abstract":"X-ray crystallography is the most widely used method for determining the three-dimensional structures of proteins and other macromolecules. One of the most difficult steps in crystallography is interpreting the electron density map to build the final model. This is often done manually by crystallographers and is very time-consuming and error-prone. In this paper, we introduce a new automated system called TEXTAL for interpreting electron density maps using pattern recognition. Given a map to be modeled, TEXTAL divides the map into small regions and then finds regions with a similar pattern of density in a database of maps for proteins whose structures have already been solved. When a match is found, the coordinates of atoms in the region are inferred by analogy. The key to making the database lookup efficient is to extract numeric features that represent the patterns in each region and to compare feature values using a weighted Euclidean distance metric. It is crucial that the features be rotation-invariant, since regions with similar patterns of density can be oriented in any arbitrary way. This pattern-recognition approach can take advantage of data accumulated in large crystallographic databases to effectively learn the association between electron density and molecular structure by example.","PeriodicalId":79420,"journal":{"name":"Proceedings. International Conference on Intelligent Systems for Molecular Biology","volume":" ","pages":"130-7"},"PeriodicalIF":0.0,"publicationDate":"1999-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"21634900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀