Computer applications in the biosciences : CABIOS最新文献

英文中文

DISTREE: a tool for estimating genetic distances between aligned DNA sequences. DISTREE:用于估计排列DNA序列之间的遗传距离的工具。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.445

J Schäfer, M Schöniger

Motivation: Substitution rates estimated from aligned DNA data can be used as genetic distances to investigate the phylogenetic relationship of those sequences. For this purpose, a Markov model of nucleotide substitution has to be assumed that describes this process most adequately.

Results: A program is presented that estimates substitution rates and their standard errors for a variety of Markov models. The model introduced by Hasegawa et al. (J. Mol. Evol., 22, 160-174, 1985) is the only one for which distances and standard deviations need to be calculated numerically, since analytical formulae cannot be derived. Each model is implemented in two different variants: (i) assuming rate homogeneity or (ii) starting from Gamma-distributed substitution rates across sequence sites. The estimation of heterogeneous substitution rates is based on a method suggested by Tamura and Nei (Mol. Biol. Evol., 10, 512-526, 1993). All required parameters are estimated from sequence data, hence the user is not asked to supply any additional input. One goal of the program is to support the user when choosing a particular model that describes most adequately the evolution of the given data set. For this purpose, a more detailed analysis of this model fit is provided. Phylogenetic trees reconstructed from the inferred distances using the neighbor-joining algorithm are also available.

动机:从比对的DNA数据中估计的替代率可以用作研究这些序列的系统发育关系的遗传距离。为此，必须假设一个最充分地描述这一过程的核苷酸替代的马尔可夫模型。结果:提出了一个程序，估计替代率及其标准误差的各种马尔可夫模型。Hasegawa et al. (J. Mol. evolution .)引入的模型。(22,160 -174, 1985)是唯一需要用数值方法计算距离和标准偏差的方法，因为无法推导出解析公式。每个模型以两种不同的变体实现:(i)假设速率同质性或(ii)从序列位点上的γ分布替代率开始。非均相取代率的估算基于Tamura和Nei (Mol. Biol)提出的方法。另一个星球。， 10, 512-526, 1993)。所有必需的参数都是从序列数据中估计出来的，因此不要求用户提供任何额外的输入。该程序的一个目标是支持用户选择最充分地描述给定数据集演变的特定模型。为此，提供了对该模型拟合的更详细的分析。利用邻居连接算法从推断的距离重建系统发育树也是可行的。

{"title":"DISTREE: a tool for estimating genetic distances between aligned DNA sequences.","authors":"J Schäfer, M Schöniger","doi":"10.1093/bioinformatics/13.4.445","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.445","url":null,"abstract":"Motivation: Substitution rates estimated from aligned DNA data can be used as genetic distances to investigate the phylogenetic relationship of those sequences. For this purpose, a Markov model of nucleotide substitution has to be assumed that describes this process most adequately.Results: A program is presented that estimates substitution rates and their standard errors for a variety of Markov models. The model introduced by Hasegawa et al. (J. Mol. Evol., 22, 160-174, 1985) is the only one for which distances and standard deviations need to be calculated numerically, since analytical formulae cannot be derived. Each model is implemented in two different variants: (i) assuming rate homogeneity or (ii) starting from Gamma-distributed substitution rates across sequence sites. The estimation of heterogeneous substitution rates is based on a method suggested by Tamura and Nei (Mol. Biol. Evol., 10, 512-526, 1993). All required parameters are estimated from sequence data, hence the user is not asked to supply any additional input. One goal of the program is to support the user when choosing a particular model that describes most adequately the evolution of the given data set. For this purpose, a more detailed analysis of this model fit is provided. Phylogenetic trees reconstructed from the inferred distances using the neighbor-joining algorithm are also available.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"445-51"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.445","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20224209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm. 用大规模并行遗传算法预测RNA h型假结。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.459

B A Shapiro, J C Wu

Motivation: Using the genetic algorithm (GA) for RNA folding on a massively parallel supercomputer, MasPar MP-2 with 16,384 processors, we successfully predicted the existence of H-type pseudoknots in several sequences.

Results: The GA is applied to folding the tRNA-like 3' end of turnip yellow mosaic virus (TYMV) RNA sequence with 82 nucleotides, the 3' UTRs of satellite tobacco necrosis virus (STNV)-2 RNA sequence with 619 nucleotides and STNV-I RNA sequence with 622 nucleotides, and the bacteriophage T2, T4 and T6 gene 32 mRNA sequences with 946, 1340 and 946 nucleotides, respectively. The GA's results match the phylogenetically supported tertiary structures of these sequences.

研究动机:在拥有16384个处理器的大规模并行超级计算机MasPar MP-2上，利用RNA折叠遗传算法(GA)成功地预测了若干序列中h型伪结的存在。结果:GA可折叠芜菁黄花叶病毒(TYMV) RNA序列trna样3′端(82个核苷酸)，卫星烟草坏死病毒(STNV)-2 RNA序列trna样3′端(619个核苷酸)和STNV- 1 RNA序列trna样3′端(622个核苷酸)，噬菌体T2、T4和T6基因32 mRNA序列trna样3′端(946、1340和946个核苷酸)。GA的结果与这些序列的系统发育支持的三级结构相匹配。

引用次数: 44

Linguistic approaches to biological sequences. 生物学序列的语言学方法。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.333

D B Searls

Biologists have long made use of linguistic metaphors in describing and naming cellular processes involving nucleic acid and protein sequences. Indeed, it is very natural to view the genetic 'text' and its sequential transliterations in these terms. However, a metaphor is not a tool, and it is necessary to ask whether the techniques used in analyzing other kinds of languages, such as human and computer languages, can in fact be of any use in tackling problems in molecular biology. This paper reviews the work of the author and others in applying the methods of computational linguistics to biological sequences.

生物学家长期以来一直使用语言隐喻来描述和命名涉及核酸和蛋白质序列的细胞过程。事实上，用这些术语来看待遗传“文本”及其顺序音译是非常自然的。然而，隐喻不是一种工具，有必要问一下，用于分析其他类型语言(如人类语言和计算机语言)的技术，在解决分子生物学问题时是否真的有任何用处。本文综述了作者和其他人在将计算语言学方法应用于生物序列方面的工作。

引用次数: 97

Introduction of a distance cut-off into structural alignment by the double dynamic programming algorithm. 采用双动态规划算法在结构对准中引入距离截止。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.387

H Toh

Two approximations were introduced into the double dynamic programming algorithm, in order to reduce the computational time for structural alignment. One of them was the so-called distance cut-off, which approximately describes the structural environment of each residue by its local environment. In the approximation, a sphere with a given radius is placed at the center of the side chain of each residue. The local environment of a residue is constituted only by the residues with side chain centers that are present within the sphere, which is expressed by a set of center-to-center distances from the side chain of the residue to those of all the other constituent residues. The residues outside the sphere are neglected from the local environment. Another approximation is associated with the distance cut-off, which is referred to here as the delta N cut-off. If two local environments are similar to each other, the numbers of residues constituting the environments are expected to be similar. The delta N cut-off was introduced based on the idea. If the difference between the numbers of the constituent residues of two local environments is greater than a given threshold value, delta N, the evaluation of the similarity between the local environments is skipped. The introduction of the two approximations dramatically reduced the computational time for structural alignment by the double dynamic programming algorithm. However, the approximations also decreased the accuracy of the alignment. To improve the accuracy with the approximations, a program with a two-step alignment algorithm was constructed. At first, an alignment was roughly constructed with the approximations. Then, the epsilon-suboptimal region for the alignment was determined. Finally, the double dynamic programming algorithm with full structural environments was applied to the residue pairs within the epsilon-suboptimal region to produce an improved alignment.

为了减少结构对准的计算时间，在双动态规划算法中引入了两个近似。其中一种是所谓的距离截止，它通过每个残基的局部环境来近似描述其结构环境。在近似中，一个给定半径的球体被放置在每个残基侧链的中心。残基的局部环境仅由存在于球内的具有侧链中心的残基构成，它由残基侧链到所有其他组成残基的中心到中心的一组距离来表示。球外的残馀与局部环境无关。另一种近似与距离截止有关，这里称为N截止。如果两个局部环境彼此相似，则构成这两个环境的残数应该是相似的。在此基础上引入了N截止。如果两个局部环境的组成残数之间的差值大于给定的阈值N，则跳过对局部环境之间相似性的评估。这两种近似的引入大大减少了双动态规划算法的结构对准计算时间。然而，近似也降低了对准的精度。为了提高逼近的精度，构造了一个两步对齐算法程序。首先，用这些近似近似近似地构造一条直线。然后，确定了排列的次优区域。最后，将全结构环境下的双动态规划算法应用于epsilon-次优区域内的残差对，得到改进的对齐方式。

{"title":"Introduction of a distance cut-off into structural alignment by the double dynamic programming algorithm.","authors":"H Toh","doi":"10.1093/bioinformatics/13.4.387","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.387","url":null,"abstract":"Two approximations were introduced into the double dynamic programming algorithm, in order to reduce the computational time for structural alignment. One of them was the so-called distance cut-off, which approximately describes the structural environment of each residue by its local environment. In the approximation, a sphere with a given radius is placed at the center of the side chain of each residue. The local environment of a residue is constituted only by the residues with side chain centers that are present within the sphere, which is expressed by a set of center-to-center distances from the side chain of the residue to those of all the other constituent residues. The residues outside the sphere are neglected from the local environment. Another approximation is associated with the distance cut-off, which is referred to here as the delta N cut-off. If two local environments are similar to each other, the numbers of residues constituting the environments are expected to be similar. The delta N cut-off was introduced based on the idea. If the difference between the numbers of the constituent residues of two local environments is greater than a given threshold value, delta N, the evaluation of the similarity between the local environments is skipped. The introduction of the two approximations dramatically reduced the computational time for structural alignment by the double dynamic programming algorithm. However, the approximations also decreased the accuracy of the alignment. To improve the accuracy with the approximations, a program with a two-step alignment algorithm was constructed. At first, an alignment was roughly constructed with the approximations. Then, the epsilon-suboptimal region for the alignment was determined. Finally, the double dynamic programming algorithm with full structural environments was applied to the residue pairs within the epsilon-suboptimal region to produce an improved alignment.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"387-96"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A computer program for the analysis of protein complex formation. 用于分析蛋白质复合体形成的计算机程序。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.439

S Lay, D Bray

Motivation: We needed an efficient way to explore the binding reactions leading to protein complexes of known composition and structure.

Results: A new program is described that allows the user to define a set of protein elements and to link these elements into an oligomeric 'ball-and-stick' assembly in a graphical interface. Once the structure of the oligomer has been defined, the program then employs a novel algorithm to deduce the binding reactions and intermediate complexes needed to make the oligomer from its starting protein components. The program also finds the equilibrium state of the system, using either default starting concentrations and Kd values or data supplied by the user.

动机:我们需要一种有效的方法来探索导致已知组成和结构的蛋白质复合物的结合反应。结果:描述了一个新的程序，允许用户定义一组蛋白质元素，并在图形界面中将这些元素连接到寡聚物“球-棒”组装中。一旦确定了低聚物的结构，该程序就会采用一种新的算法来推断从其起始蛋白质组分中生成低聚物所需的结合反应和中间复合物。该程序还发现系统的平衡状态，使用默认的开始浓度和Kd值或用户提供的数据。

引用次数: 6

The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models. 可逆希尔方程:如何将合作酶纳入代谢模型。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.377

J H Hofmeyr, A Cornish-Bowden

Motivation: Realistic simulation of the kinetic properties of metabolic pathways requires rate equations to be expressed in reversible form, because substrate and product elasticities are drastically different in reversible and irreversible reactions. This presents no special problem for reactions that follow reversible Michaelis-Menten kinetics, but for enzymes showing cooperative kinetics the full reversible rate equations are extremely complicated, and anyway in virtually all cases the full equations are unknown because sufficiently complete kinetic studies have not been carried out. There is a need, therefore, for approximate reversible equations that allow convenient simulation without violating thermodynamic constraints.

Results: We show how the irreversible Hill equation can be generalized to a reversible form, including effects of modifiers. The proposed equation leads to behaviour virtually indistinguishable from that predicted by a kinetic form of the Adair equation, despite the fact that the latter is a far more complicated equation. By contrast, a reversible form of the Monod-Wyman-Changeux equation that has sometimes been used leads to predictions for the effects of modifiers at high substrate concentration that differ qualitatively from those given by the Adair equation.

动机:真实模拟代谢途径的动力学性质需要速率方程以可逆形式表示，因为在可逆反应和不可逆反应中，底物和产物的弹性是截然不同的。这对于遵循可逆Michaelis-Menten动力学的反应没有特别的问题，但对于表现出协同动力学的酶，完整的可逆速率方程是极其复杂的，而且无论如何，在几乎所有情况下，完整的方程都是未知的，因为还没有进行足够完整的动力学研究。因此，需要近似可逆方程，以便在不违反热力学约束的情况下方便地进行模拟。结果:我们展示了如何将不可逆的希尔方程推广到可逆形式，包括修饰符的影响。所提出的方程导致的行为实际上与Adair方程的动力学形式所预测的行为没有区别，尽管后者是一个复杂得多的方程。相比之下，有时使用的可逆形式的Monod-Wyman-Changeux方程会导致对高底物浓度下改进剂效果的预测，这与Adair方程给出的结果在质量上有所不同。

{"title":"The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models.","authors":"J H Hofmeyr, A Cornish-Bowden","doi":"10.1093/bioinformatics/13.4.377","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.377","url":null,"abstract":"Motivation: Realistic simulation of the kinetic properties of metabolic pathways requires rate equations to be expressed in reversible form, because substrate and product elasticities are drastically different in reversible and irreversible reactions. This presents no special problem for reactions that follow reversible Michaelis-Menten kinetics, but for enzymes showing cooperative kinetics the full reversible rate equations are extremely complicated, and anyway in virtually all cases the full equations are unknown because sufficiently complete kinetic studies have not been carried out. There is a need, therefore, for approximate reversible equations that allow convenient simulation without violating thermodynamic constraints.Results: We show how the irreversible Hill equation can be generalized to a reversible form, including effects of modifiers. The proposed equation leads to behaviour virtually indistinguishable from that predicted by a kinetic form of the Adair equation, despite the fact that the latter is a far more complicated equation. By contrast, a reversible form of the Monod-Wyman-Changeux equation that has sometimes been used leads to predictions for the effects of modifiers at high substrate concentration that differ qualitatively from those given by the Adair equation.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"377-85"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 140

Meta-MEME: motif-based hidden Markov models of protein families. Meta-MEME:基于基序的蛋白质家族隐马尔可夫模型。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.397

W N Grundy, T L Bailey, C P Elkan, M E Baker

Motivation: Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately.

Results: This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.

动机:使用隐马尔可夫模型(hmm)建模相关生物序列的家族，尽管越来越广泛，但至少面临一个主要问题:由于这些数学模型的复杂性，它们需要相对较大的训练集才能准确识别给定的家族。对于已知序列很少的家族，标准线性HMM包含太多参数，无法充分训练。结果:这项工作试图通过产生更小的hmm来解决这个问题，这些hmm精确地模拟了家族的保守区域。这些hmm是使用MEME软件从EM算法生成的基序模型构建而成的。由于基于图案的hmm具有相对较少的参数，因此它们可以使用较小的数据集进行训练。对短链醇脱氢酶和4Fe-4S铁氧化还原蛋白的研究支持了基于基元的hmm在数据库搜索中表现出更高的敏感性和选择性的说法，特别是当训练集包含很少的序列时。

{"title":"Meta-MEME: motif-based hidden Markov models of protein families.","authors":"W N Grundy, T L Bailey, C P Elkan, M E Baker","doi":"10.1093/bioinformatics/13.4.397","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.397","url":null,"abstract":"Motivation: Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately.Results: This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"397-406"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.397","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 219

Prediction of protein secondary structure using the 3D-1D compatibility algorithm. 利用3D-1D相容算法预测蛋白质二级结构。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.415

M Ito, Y Matsuo, K Nishikawa

A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.

提出了一种完全依赖于蛋白质整体结构的蛋白质二级结构预测新方法。预测方案如下:首先使用之前开发的3D-1D兼容方法对结构库进行查询序列扫描。所有被检查的结构都按照兼容性评分进行排序，并在列表中挑选出前50名。然后，根据3D-1D比对，将50个蛋白质的所有已知二级结构与查询序列进行全局比对。对α螺旋、β链或螺旋的预测是通过在每个残基位点的观察中取大多数来完成的。除了结构库中的325个蛋白质外，还从最新发布的Brookhaven Protein Data Bank中选择了77个蛋白质，并将其分为三个数据集。以数据集1作为训练集，对方法中的几个可调参数进行优化。然后，将该方法的最终形式应用于包含链长<或= 400个残基的蛋白质的测试集(数据集2)。在alpha、beta和coil三种状态评估中，平均预测准确率高达69%。另一方面，数据集3仅包含长度> 400个残基的蛋白质，由于3D-1D相容性方法固有的尺寸效应，目前的方法将无法正常工作。因此，在输入到预测程序之前，数据集3中的蛋白质被细分为组成域(数据集4)。数据集4的预测精度平均为66%，比数据集2的预测精度低几个百分点。讨论了造成这种差异的可能原因。

{"title":"Prediction of protein secondary structure using the 3D-1D compatibility algorithm.","authors":"M Ito, Y Matsuo, K Nishikawa","doi":"10.1093/bioinformatics/13.4.415","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.415","url":null,"abstract":"A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"415-24"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

A simple method for sizing large fragments of bacterial DNA separated by PFGE. 一种用PFGE分离细菌DNA大片段的简单方法。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.485

E Lorenz, S Leeton, R J Owen

Pulsed field gel electrophoresis (PFGE) of genomic DNA is a highly discriminatory molecular profiling technique for the epidemiological investigation of bacterial strains causing infections in human populations. It has been applied to study an increasing number of pathogenic bacterial species (Tenover et aL, 1995) and is particularly useful for Campylobacter jejuni (Owen et aL, 1995). The interpretation and comparison of PFGE profiles is simplified if the size of each individual DNA fragment is known. Various computer programs have been developed for sizing DNA fragments in conventional electrophoretic agarose gels and these are generally applicable to fragments in the range 100 bp— 30kbp but do not perform well on PFGE fragments in the range 40—900 kbp. This is because DNA migration in PFGE differs from that in conventional gels and a different relationship between mobility and size is required. The aim of the present study was to develop and evaluate a method for sizing DNA fragments in the range 40—900 kbp. We describe how to use the software package Microsoft Excel 5.0 to build a set of files designed to size DNA fragments for both conventional gels and PFGE.

{"title":"A simple method for sizing large fragments of bacterial DNA separated by PFGE.","authors":"E Lorenz, S Leeton, R J Owen","doi":"10.1093/bioinformatics/13.4.485","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.485","url":null,"abstract":"Pulsed field gel electrophoresis (PFGE) of genomic DNA is a highly discriminatory molecular profiling technique for the epidemiological investigation of bacterial strains causing infections in human populations. It has been applied to study an increasing number of pathogenic bacterial species (Tenover et aL, 1995) and is particularly useful for Campylobacter jejuni (Owen et aL, 1995). The interpretation and comparison of PFGE profiles is simplified if the size of each individual DNA fragment is known. Various computer programs have been developed for sizing DNA fragments in conventional electrophoretic agarose gels and these are generally applicable to fragments in the range 100 bp— 30kbp but do not perform well on PFGE fragments in the range 40—900 kbp. This is because DNA migration in PFGE differs from that in conventional gels and a different relationship between mobility and size is required. The aim of the present study was to develop and evaluate a method for sizing DNA fragments in the range 40—900 kbp. We describe how to use the software package Microsoft Excel 5.0 to build a set of files designed to size DNA fragments for both conventional gels and PFGE.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"485-6"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20224217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A method for identifying splice sites and translational start sites in eukaryotic mRNA. 真核生物mRNA剪接位点和翻译起始位点的鉴定方法。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-08-01 DOI: 10.1093/bioinformatics/13.4.365

S L Salzberg

This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic mRNA. The method takes into account the dependencies between adjacent bases, in contrast to the usual technique of considering each position independently. When coupled with a dynamic program to compute the most likely sequence, new consensus sequences emerge. The consensus sequence information is summarized in conditional probability matrices which, when used to locate signals in uncharacterized genomic DNA, have greater sensitivity and specificity than conventional matrices. Species-specific versions of these matrices are especially effective at distinguishing true and false sites.

本文描述了一种确定真核mRNA翻译开始和外显子和内含子(供体和受体位点)边界的一致序列的新方法。与通常独立考虑每个位置的方法不同，该方法考虑了相邻基之间的依赖关系。当与计算最可能序列的动态程序相结合时，新的共识序列就出现了。共识序列信息总结在条件概率矩阵中，当用于定位非特征基因组DNA中的信号时，条件概率矩阵比常规矩阵具有更高的灵敏度和特异性。这些矩阵的物种特异性版本在区分真假位点方面特别有效。

引用次数: 139

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer applications in the biosciences : CABIOS

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀