Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.445
J Schäfer, M Schöniger
Motivation: Substitution rates estimated from aligned DNA data can be used as genetic distances to investigate the phylogenetic relationship of those sequences. For this purpose, a Markov model of nucleotide substitution has to be assumed that describes this process most adequately.
Results: A program is presented that estimates substitution rates and their standard errors for a variety of Markov models. The model introduced by Hasegawa et al. (J. Mol. Evol., 22, 160-174, 1985) is the only one for which distances and standard deviations need to be calculated numerically, since analytical formulae cannot be derived. Each model is implemented in two different variants: (i) assuming rate homogeneity or (ii) starting from Gamma-distributed substitution rates across sequence sites. The estimation of heterogeneous substitution rates is based on a method suggested by Tamura and Nei (Mol. Biol. Evol., 10, 512-526, 1993). All required parameters are estimated from sequence data, hence the user is not asked to supply any additional input. One goal of the program is to support the user when choosing a particular model that describes most adequately the evolution of the given data set. For this purpose, a more detailed analysis of this model fit is provided. Phylogenetic trees reconstructed from the inferred distances using the neighbor-joining algorithm are also available.
{"title":"DISTREE: a tool for estimating genetic distances between aligned DNA sequences.","authors":"J Schäfer, M Schöniger","doi":"10.1093/bioinformatics/13.4.445","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.445","url":null,"abstract":"<p><strong>Motivation: </strong>Substitution rates estimated from aligned DNA data can be used as genetic distances to investigate the phylogenetic relationship of those sequences. For this purpose, a Markov model of nucleotide substitution has to be assumed that describes this process most adequately.</p><p><strong>Results: </strong>A program is presented that estimates substitution rates and their standard errors for a variety of Markov models. The model introduced by Hasegawa et al. (J. Mol. Evol., 22, 160-174, 1985) is the only one for which distances and standard deviations need to be calculated numerically, since analytical formulae cannot be derived. Each model is implemented in two different variants: (i) assuming rate homogeneity or (ii) starting from Gamma-distributed substitution rates across sequence sites. The estimation of heterogeneous substitution rates is based on a method suggested by Tamura and Nei (Mol. Biol. Evol., 10, 512-526, 1993). All required parameters are estimated from sequence data, hence the user is not asked to supply any additional input. One goal of the program is to support the user when choosing a particular model that describes most adequately the evolution of the given data set. For this purpose, a more detailed analysis of this model fit is provided. Phylogenetic trees reconstructed from the inferred distances using the neighbor-joining algorithm are also available.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"445-51"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.445","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20224209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.459
B A Shapiro, J C Wu
Motivation: Using the genetic algorithm (GA) for RNA folding on a massively parallel supercomputer, MasPar MP-2 with 16,384 processors, we successfully predicted the existence of H-type pseudoknots in several sequences.
Results: The GA is applied to folding the tRNA-like 3' end of turnip yellow mosaic virus (TYMV) RNA sequence with 82 nucleotides, the 3' UTRs of satellite tobacco necrosis virus (STNV)-2 RNA sequence with 619 nucleotides and STNV-I RNA sequence with 622 nucleotides, and the bacteriophage T2, T4 and T6 gene 32 mRNA sequences with 946, 1340 and 946 nucleotides, respectively. The GA's results match the phylogenetically supported tertiary structures of these sequences.
{"title":"Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm.","authors":"B A Shapiro, J C Wu","doi":"10.1093/bioinformatics/13.4.459","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.459","url":null,"abstract":"<p><strong>Motivation: </strong>Using the genetic algorithm (GA) for RNA folding on a massively parallel supercomputer, MasPar MP-2 with 16,384 processors, we successfully predicted the existence of H-type pseudoknots in several sequences.</p><p><strong>Results: </strong>The GA is applied to folding the tRNA-like 3' end of turnip yellow mosaic virus (TYMV) RNA sequence with 82 nucleotides, the 3' UTRs of satellite tobacco necrosis virus (STNV)-2 RNA sequence with 619 nucleotides and STNV-I RNA sequence with 622 nucleotides, and the bacteriophage T2, T4 and T6 gene 32 mRNA sequences with 946, 1340 and 946 nucleotides, respectively. The GA's results match the phylogenetically supported tertiary structures of these sequences.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"459-71"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20224211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.333
D B Searls
Biologists have long made use of linguistic metaphors in describing and naming cellular processes involving nucleic acid and protein sequences. Indeed, it is very natural to view the genetic 'text' and its sequential transliterations in these terms. However, a metaphor is not a tool, and it is necessary to ask whether the techniques used in analyzing other kinds of languages, such as human and computer languages, can in fact be of any use in tackling problems in molecular biology. This paper reviews the work of the author and others in applying the methods of computational linguistics to biological sequences.
{"title":"Linguistic approaches to biological sequences.","authors":"D B Searls","doi":"10.1093/bioinformatics/13.4.333","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.333","url":null,"abstract":"<p><p>Biologists have long made use of linguistic metaphors in describing and naming cellular processes involving nucleic acid and protein sequences. Indeed, it is very natural to view the genetic 'text' and its sequential transliterations in these terms. However, a metaphor is not a tool, and it is necessary to ask whether the techniques used in analyzing other kinds of languages, such as human and computer languages, can in fact be of any use in tackling problems in molecular biology. This paper reviews the work of the author and others in applying the methods of computational linguistics to biological sequences.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"333-44"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.387
H Toh
Two approximations were introduced into the double dynamic programming algorithm, in order to reduce the computational time for structural alignment. One of them was the so-called distance cut-off, which approximately describes the structural environment of each residue by its local environment. In the approximation, a sphere with a given radius is placed at the center of the side chain of each residue. The local environment of a residue is constituted only by the residues with side chain centers that are present within the sphere, which is expressed by a set of center-to-center distances from the side chain of the residue to those of all the other constituent residues. The residues outside the sphere are neglected from the local environment. Another approximation is associated with the distance cut-off, which is referred to here as the delta N cut-off. If two local environments are similar to each other, the numbers of residues constituting the environments are expected to be similar. The delta N cut-off was introduced based on the idea. If the difference between the numbers of the constituent residues of two local environments is greater than a given threshold value, delta N, the evaluation of the similarity between the local environments is skipped. The introduction of the two approximations dramatically reduced the computational time for structural alignment by the double dynamic programming algorithm. However, the approximations also decreased the accuracy of the alignment. To improve the accuracy with the approximations, a program with a two-step alignment algorithm was constructed. At first, an alignment was roughly constructed with the approximations. Then, the epsilon-suboptimal region for the alignment was determined. Finally, the double dynamic programming algorithm with full structural environments was applied to the residue pairs within the epsilon-suboptimal region to produce an improved alignment.
{"title":"Introduction of a distance cut-off into structural alignment by the double dynamic programming algorithm.","authors":"H Toh","doi":"10.1093/bioinformatics/13.4.387","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.387","url":null,"abstract":"<p><p>Two approximations were introduced into the double dynamic programming algorithm, in order to reduce the computational time for structural alignment. One of them was the so-called distance cut-off, which approximately describes the structural environment of each residue by its local environment. In the approximation, a sphere with a given radius is placed at the center of the side chain of each residue. The local environment of a residue is constituted only by the residues with side chain centers that are present within the sphere, which is expressed by a set of center-to-center distances from the side chain of the residue to those of all the other constituent residues. The residues outside the sphere are neglected from the local environment. Another approximation is associated with the distance cut-off, which is referred to here as the delta N cut-off. If two local environments are similar to each other, the numbers of residues constituting the environments are expected to be similar. The delta N cut-off was introduced based on the idea. If the difference between the numbers of the constituent residues of two local environments is greater than a given threshold value, delta N, the evaluation of the similarity between the local environments is skipped. The introduction of the two approximations dramatically reduced the computational time for structural alignment by the double dynamic programming algorithm. However, the approximations also decreased the accuracy of the alignment. To improve the accuracy with the approximations, a program with a two-step alignment algorithm was constructed. At first, an alignment was roughly constructed with the approximations. Then, the epsilon-suboptimal region for the alignment was determined. Finally, the double dynamic programming algorithm with full structural environments was applied to the residue pairs within the epsilon-suboptimal region to produce an improved alignment.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"387-96"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.439
S Lay, D Bray
Motivation: We needed an efficient way to explore the binding reactions leading to protein complexes of known composition and structure.
Results: A new program is described that allows the user to define a set of protein elements and to link these elements into an oligomeric 'ball-and-stick' assembly in a graphical interface. Once the structure of the oligomer has been defined, the program then employs a novel algorithm to deduce the binding reactions and intermediate complexes needed to make the oligomer from its starting protein components. The program also finds the equilibrium state of the system, using either default starting concentrations and Kd values or data supplied by the user.
{"title":"A computer program for the analysis of protein complex formation.","authors":"S Lay, D Bray","doi":"10.1093/bioinformatics/13.4.439","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.439","url":null,"abstract":"<p><strong>Motivation: </strong>We needed an efficient way to explore the binding reactions leading to protein complexes of known composition and structure.</p><p><strong>Results: </strong>A new program is described that allows the user to define a set of protein elements and to link these elements into an oligomeric 'ball-and-stick' assembly in a graphical interface. Once the structure of the oligomer has been defined, the program then employs a novel algorithm to deduce the binding reactions and intermediate complexes needed to make the oligomer from its starting protein components. The program also finds the equilibrium state of the system, using either default starting concentrations and Kd values or data supplied by the user.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"439-44"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.439","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.377
J H Hofmeyr, A Cornish-Bowden
Motivation: Realistic simulation of the kinetic properties of metabolic pathways requires rate equations to be expressed in reversible form, because substrate and product elasticities are drastically different in reversible and irreversible reactions. This presents no special problem for reactions that follow reversible Michaelis-Menten kinetics, but for enzymes showing cooperative kinetics the full reversible rate equations are extremely complicated, and anyway in virtually all cases the full equations are unknown because sufficiently complete kinetic studies have not been carried out. There is a need, therefore, for approximate reversible equations that allow convenient simulation without violating thermodynamic constraints.
Results: We show how the irreversible Hill equation can be generalized to a reversible form, including effects of modifiers. The proposed equation leads to behaviour virtually indistinguishable from that predicted by a kinetic form of the Adair equation, despite the fact that the latter is a far more complicated equation. By contrast, a reversible form of the Monod-Wyman-Changeux equation that has sometimes been used leads to predictions for the effects of modifiers at high substrate concentration that differ qualitatively from those given by the Adair equation.
{"title":"The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models.","authors":"J H Hofmeyr, A Cornish-Bowden","doi":"10.1093/bioinformatics/13.4.377","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.377","url":null,"abstract":"<p><strong>Motivation: </strong>Realistic simulation of the kinetic properties of metabolic pathways requires rate equations to be expressed in reversible form, because substrate and product elasticities are drastically different in reversible and irreversible reactions. This presents no special problem for reactions that follow reversible Michaelis-Menten kinetics, but for enzymes showing cooperative kinetics the full reversible rate equations are extremely complicated, and anyway in virtually all cases the full equations are unknown because sufficiently complete kinetic studies have not been carried out. There is a need, therefore, for approximate reversible equations that allow convenient simulation without violating thermodynamic constraints.</p><p><strong>Results: </strong>We show how the irreversible Hill equation can be generalized to a reversible form, including effects of modifiers. The proposed equation leads to behaviour virtually indistinguishable from that predicted by a kinetic form of the Adair equation, despite the fact that the latter is a far more complicated equation. By contrast, a reversible form of the Monod-Wyman-Changeux equation that has sometimes been used leads to predictions for the effects of modifiers at high substrate concentration that differ qualitatively from those given by the Adair equation.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"377-85"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.397
W N Grundy, T L Bailey, C P Elkan, M E Baker
Motivation: Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately.
Results: This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.
{"title":"Meta-MEME: motif-based hidden Markov models of protein families.","authors":"W N Grundy, T L Bailey, C P Elkan, M E Baker","doi":"10.1093/bioinformatics/13.4.397","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.397","url":null,"abstract":"<p><strong>Motivation: </strong>Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains too many parameters to be trained adequately.</p><p><strong>Results: </strong>This work attempts to solve that problem by generating smaller HMMs which precisely model only the conserved regions of the family. These HMMs are constructed from motif models generated by the EM algorithm using the MEME software. Because motif-based HMMs have relatively few parameters, they can be trained using smaller data sets. Studies of short chain alcohol dehydrogenases and 4Fe-4S ferredoxins support the claim that motif-based HMMs exhibit increased sensitivity and selectivity in database searches, especially when training sets contain few sequences.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"397-406"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.397","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.415
M Ito, Y Matsuo, K Nishikawa
A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.
提出了一种完全依赖于蛋白质整体结构的蛋白质二级结构预测新方法。预测方案如下:首先使用之前开发的3D-1D兼容方法对结构库进行查询序列扫描。所有被检查的结构都按照兼容性评分进行排序,并在列表中挑选出前50名。然后,根据3D-1D比对,将50个蛋白质的所有已知二级结构与查询序列进行全局比对。对α螺旋、β链或螺旋的预测是通过在每个残基位点的观察中取大多数来完成的。除了结构库中的325个蛋白质外,还从最新发布的Brookhaven Protein Data Bank中选择了77个蛋白质,并将其分为三个数据集。以数据集1作为训练集,对方法中的几个可调参数进行优化。然后,将该方法的最终形式应用于包含链长<或= 400个残基的蛋白质的测试集(数据集2)。在alpha、beta和coil三种状态评估中,平均预测准确率高达69%。另一方面,数据集3仅包含长度> 400个残基的蛋白质,由于3D-1D相容性方法固有的尺寸效应,目前的方法将无法正常工作。因此,在输入到预测程序之前,数据集3中的蛋白质被细分为组成域(数据集4)。数据集4的预测精度平均为66%,比数据集2的预测精度低几个百分点。讨论了造成这种差异的可能原因。
{"title":"Prediction of protein secondary structure using the 3D-1D compatibility algorithm.","authors":"M Ito, Y Matsuo, K Nishikawa","doi":"10.1093/bioinformatics/13.4.415","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.415","url":null,"abstract":"<p><p>A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"415-24"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.485
E Lorenz, S Leeton, R J Owen
Pulsed field gel electrophoresis (PFGE) of genomic DNA is a highly discriminatory molecular profiling technique for the epidemiological investigation of bacterial strains causing infections in human populations. It has been applied to study an increasing number of pathogenic bacterial species (Tenover et aL, 1995) and is particularly useful for Campylobacter jejuni (Owen et aL, 1995). The interpretation and comparison of PFGE profiles is simplified if the size of each individual DNA fragment is known. Various computer programs have been developed for sizing DNA fragments in conventional electrophoretic agarose gels and these are generally applicable to fragments in the range 100 bp— 30kbp but do not perform well on PFGE fragments in the range 40—900 kbp. This is because DNA migration in PFGE differs from that in conventional gels and a different relationship between mobility and size is required. The aim of the present study was to develop and evaluate a method for sizing DNA fragments in the range 40—900 kbp. We describe how to use the software package Microsoft Excel 5.0 to build a set of files designed to size DNA fragments for both conventional gels and PFGE.
{"title":"A simple method for sizing large fragments of bacterial DNA separated by PFGE.","authors":"E Lorenz, S Leeton, R J Owen","doi":"10.1093/bioinformatics/13.4.485","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.485","url":null,"abstract":"Pulsed field gel electrophoresis (PFGE) of genomic DNA is a highly discriminatory molecular profiling technique for the epidemiological investigation of bacterial strains causing infections in human populations. It has been applied to study an increasing number of pathogenic bacterial species (Tenover et aL, 1995) and is particularly useful for Campylobacter jejuni (Owen et aL, 1995). The interpretation and comparison of PFGE profiles is simplified if the size of each individual DNA fragment is known. Various computer programs have been developed for sizing DNA fragments in conventional electrophoretic agarose gels and these are generally applicable to fragments in the range 100 bp— 30kbp but do not perform well on PFGE fragments in the range 40—900 kbp. This is because DNA migration in PFGE differs from that in conventional gels and a different relationship between mobility and size is required. The aim of the present study was to develop and evaluate a method for sizing DNA fragments in the range 40—900 kbp. We describe how to use the software package Microsoft Excel 5.0 to build a set of files designed to size DNA fragments for both conventional gels and PFGE.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"485-6"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20224217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-01DOI: 10.1093/bioinformatics/13.4.365
S L Salzberg
This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic mRNA. The method takes into account the dependencies between adjacent bases, in contrast to the usual technique of considering each position independently. When coupled with a dynamic program to compute the most likely sequence, new consensus sequences emerge. The consensus sequence information is summarized in conditional probability matrices which, when used to locate signals in uncharacterized genomic DNA, have greater sensitivity and specificity than conventional matrices. Species-specific versions of these matrices are especially effective at distinguishing true and false sites.
{"title":"A method for identifying splice sites and translational start sites in eukaryotic mRNA.","authors":"S L Salzberg","doi":"10.1093/bioinformatics/13.4.365","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.4.365","url":null,"abstract":"<p><p>This paper describes a new method for determining the consensus sequences that signal the start of translation and the boundaries between exons and introns (donor and acceptor sites) in eukaryotic mRNA. The method takes into account the dependencies between adjacent bases, in contrast to the usual technique of considering each position independently. When coupled with a dynamic program to compute the most likely sequence, new consensus sequences emerge. The consensus sequence information is summarized in conditional probability matrices which, when used to locate signals in uncharacterized genomic DNA, have greater sensitivity and specificity than conventional matrices. Species-specific versions of these matrices are especially effective at distinguishing true and false sites.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 4","pages":"365-76"},"PeriodicalIF":0.0,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20225048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}