Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.205
I Belyi, P A Pevzner
Sequencing by hybridization (SBH) is a promising alternative approach to DNA sequencing and mutation detection. Analysis of the resolving power of SBH involves rather difficult combinatorial and probabilistic problems, and sometimes computer simulation is the only way to estimate the parameters and limitations of SBH experiments. This paper describes a software package, DNA-SPECTRUM, which allows one to analyze the resolving power and parameters of SBH. We also introduce the technique for visualizing multiple SBH reconstructions and describe applications of DNA-SPECTRUM to estimate various SBH parameters. DNA-SPECTRUM is available at http://www-hto.usc.edu/software/sbh/index. html.
{"title":"Software for DNA sequencing by hybridization.","authors":"I Belyi, P A Pevzner","doi":"10.1093/bioinformatics/13.2.205","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.205","url":null,"abstract":"<p><p>Sequencing by hybridization (SBH) is a promising alternative approach to DNA sequencing and mutation detection. Analysis of the resolving power of SBH involves rather difficult combinatorial and probabilistic problems, and sometimes computer simulation is the only way to estimate the parameters and limitations of SBH experiments. This paper describes a software package, DNA-SPECTRUM, which allows one to analyze the resolving power and parameters of SBH. We also introduce the technique for visualizing multiple SBH reconstructions and describe applications of DNA-SPECTRUM to estimate various SBH parameters. DNA-SPECTRUM is available at http://www-hto.usc.edu/software/sbh/index. html.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"205-10"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.205","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.159
R Varon, F Garcia-Sevilla, M Garcia-Moreno, F Garcia-Canovas, R Peyro, R G Duggleby
Motivation: The derivation of steady-state equations is frequently carried out in enzyme kinetic studies. Done manually, this becomes tedious and prone to human error. The computer programs now available which are able to accept reaction mechanisms of some complexity are focused only on the strict steady-state approach.
Results: Here we present a computer program called REFERASS, with a short computation time and a user-friendly format for the input and output files, able to derive the strict steady-state equations and/or those corresponding to the usual assumption that one ore more of the reversible steps are in rapid equilibrium. This program handles enzyme-catalysed reactions with mechanisms involving up to 255 enzyme species connected by up to 255 reaction steps, subject to limits imposed by the memory and disk space available.
{"title":"Computer program for the equations describing the steady state of enzyme reactions.","authors":"R Varon, F Garcia-Sevilla, M Garcia-Moreno, F Garcia-Canovas, R Peyro, R G Duggleby","doi":"10.1093/bioinformatics/13.2.159","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.159","url":null,"abstract":"<p><strong>Motivation: </strong>The derivation of steady-state equations is frequently carried out in enzyme kinetic studies. Done manually, this becomes tedious and prone to human error. The computer programs now available which are able to accept reaction mechanisms of some complexity are focused only on the strict steady-state approach.</p><p><strong>Results: </strong>Here we present a computer program called REFERASS, with a short computation time and a user-friendly format for the input and output files, able to derive the strict steady-state equations and/or those corresponding to the usual assumption that one ore more of the reversible steps are in rapid equilibrium. This program handles enzyme-catalysed reactions with mechanisms involving up to 255 enzyme species connected by up to 255 reaction steps, subject to limits imposed by the memory and disk space available.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"159-67"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.201
L Cocea
Motivation: The Human cDNA Database (HCD) at the Institute for Genomic Research (TIGR) is the most complete, non-redundant and structured collection of human expressed DNA sequences available to date. Sequences and other data can be retrieved by users having opened an account at HCD/TIGR. A HCD search involves composing and sending queries one by one and this can become time consuming if many queries must be sent. Moreover, a large amount of time is required thereafter to process the results.
Results: The HCDSearch system described here automatically composes and sends the queries by e-mail using information provided in a text file; it also greatly accelerates the processing of results, generating lists of HCD numbers and library identifiers in a format that renders them very easy to examine. The programs run on Unix platforms.
{"title":"Search the Human cDNA Database at TIGR.","authors":"L Cocea","doi":"10.1093/bioinformatics/13.2.201","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.201","url":null,"abstract":"<p><strong>Motivation: </strong>The Human cDNA Database (HCD) at the Institute for Genomic Research (TIGR) is the most complete, non-redundant and structured collection of human expressed DNA sequences available to date. Sequences and other data can be retrieved by users having opened an account at HCD/TIGR. A HCD search involves composing and sending queries one by one and this can become time consuming if many queries must be sent. Moreover, a large amount of time is required thereafter to process the results.</p><p><strong>Results: </strong>The HCDSearch system described here automatically composes and sends the queries by e-mail using information provided in a text file; it also greatly accelerates the processing of results, generating lists of HCD numbers and library identifiers in a format that renders them very easy to examine. The programs run on Unix platforms.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"201-4"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.201","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.183
Y Tsukamoto, K Takiguchi, K Satou, E Furuichi, T Takagi, S Kuhara
A deductive database system PACADE (Protein Atomic Coordinate Analyzer with Deductive Engine) has been developed for protein structure analysis. With this system, super-secondary structures described in logical and declarative rules can be retrieved effectively. For protein structure analysis, comparison of local structures in different proteins is a necessary mean. A function to search for similar structures has, therefore, been added to the PACADE system. We describe herein the result of searches for the same topological structures and three-dimensionally similar ones. A user of PACADE can select these two levels of similarity by changing parameters. This function enables the inference system to retrieve similar structures, according to the restraints of variables defined by the user. Similar super-secondary structures among proteins can be searched for automatically, which is useful for protein structure analysis. The retrieved similar super-secondary structures can serve as criteria for protein spatial alignment.
开发了一个用于蛋白质结构分析的演绎数据库系统PACADE (Protein Atomic Coordinate Analyzer with deduction Engine)。利用该系统,可以有效地检索用逻辑和声明性规则描述的超二级结构。在蛋白质结构分析中,比较不同蛋白质的局部结构是一种必要的手段。因此,在PACADE系统中增加了搜索类似结构的功能。本文描述了对相同拓扑结构和三维相似拓扑结构的搜索结果。PACADE的用户可以通过改变参数来选择这两种相似度。该功能使推理系统能够根据用户定义的变量约束检索相似的结构。该方法可以自动搜索蛋白质间相似的超二级结构,为蛋白质结构分析提供参考。检索到的相似超二级结构可以作为蛋白质空间定位的标准。
{"title":"Application of a deductive database system to search for topological and similar three-dimensional structures in protein.","authors":"Y Tsukamoto, K Takiguchi, K Satou, E Furuichi, T Takagi, S Kuhara","doi":"10.1093/bioinformatics/13.2.183","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.183","url":null,"abstract":"<p><p>A deductive database system PACADE (Protein Atomic Coordinate Analyzer with Deductive Engine) has been developed for protein structure analysis. With this system, super-secondary structures described in logical and declarative rules can be retrieved effectively. For protein structure analysis, comparison of local structures in different proteins is a necessary mean. A function to search for similar structures has, therefore, been added to the PACADE system. We describe herein the result of searches for the same topological structures and three-dimensionally similar ones. A user of PACADE can select these two levels of similarity by changing parameters. This function enables the inference system to retrieve similar structures, according to the restraints of variables defined by the user. Similar super-secondary structures among proteins can be searched for automatically, which is useful for protein structure analysis. The retrieved similar super-secondary structures can serve as criteria for protein spatial alignment.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"183-90"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.183","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.191
C Barrett, R Hughey, K Karplus
Motivation: Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.
Results: This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.
{"title":"Scoring hidden Markov models.","authors":"C Barrett, R Hughey, K Karplus","doi":"10.1093/bioinformatics/13.2.191","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.191","url":null,"abstract":"<p><strong>Motivation: </strong>Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.</p><p><strong>Results: </strong>This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"191-9"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.123
J H Woods, H M Sauro
Motivation: Metabolic Control Analysis is one of many disciplines that make use of scaled derivatives. In particular, 'elasticities' are used to quantify the effect of an effector or substrate concentration on an enzyme rate under locally specified conditions. Normally an algebraic expression for the elasticity of an enzyme is obtained by differentiating its rate law, multiplying by the effector concentration and dividing by the rate law itself: this results in considerable expression expansion, and when the results are subsequently simplified it is often at the expense of biological comprehensibility.
Results: We present a novel algorithm which not only circumvents the expression expansion, but preserves an elegant separation of the components in enzyme behaviour. Easily implemented, and producing gains in both performance and numerical precision, the algorithm is potentially applicable to a number of existing packages. It also greatly assists the manual derivation and evaluation of elasticities, allowing the elasticity of even quite complex enzyme systems to be written by inspection.
{"title":"Elasticities in Metabolic Control Analysis: algebraic derivation of simplified expressions.","authors":"J H Woods, H M Sauro","doi":"10.1093/bioinformatics/13.2.123","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.123","url":null,"abstract":"<p><strong>Motivation: </strong>Metabolic Control Analysis is one of many disciplines that make use of scaled derivatives. In particular, 'elasticities' are used to quantify the effect of an effector or substrate concentration on an enzyme rate under locally specified conditions. Normally an algebraic expression for the elasticity of an enzyme is obtained by differentiating its rate law, multiplying by the effector concentration and dividing by the rate law itself: this results in considerable expression expansion, and when the results are subsequently simplified it is often at the expense of biological comprehensibility.</p><p><strong>Results: </strong>We present a novel algorithm which not only circumvents the expression expansion, but preserves an elegant separation of the components in enzyme behaviour. Easily implemented, and producing gains in both performance and numerical precision, the algorithm is potentially applicable to a number of existing packages. It also greatly assists the manual derivation and evaluation of elasticities, allowing the elasticity of even quite complex enzyme systems to be written by inspection.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"123-30"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.123","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.115
A G Bachinsky, A A Yarigin, E H Guseva, V A Kulichkov, L P Nizolenko
A method and software tool to develop patterns of protein families has been designed. These patterns are intended for the identification of local similarities in arbitrary amino acid sequences with proteins of the SWISS-PROT bank. The method is based on the physical, chemical and structural properties of amino acids. It assembles a 'best set' of elements (a pattern) for a given group of aligned related proteins. These elements provide discrimination between proteins of a family and representatives of other families or random sequences. The method combines the advantages of BLOCKS (automatic generation of multiple elements for protein groups), PROSITE (simplicity of element presentation) and matrices/profiles (different distinctions between amino acids for different positions of aligned sequences). Using our method, a data bank of protein family patterns, PROF_PAT, is produced. This data bank is based on the 27,752 amino acid sequences of SWISS-PROT bank release 24. The characteristics of patterns of 743 related protein groups are described. The results of comparisons of PROF_PAT patterns with the proteins of the SWISS-PROT bank are discussed.
设计了一种开发蛋白质家族图谱的方法和软件工具。这些模式旨在鉴定任意氨基酸序列与SWISS-PROT蛋白库的局部相似性。该方法是基于氨基酸的物理、化学和结构性质。它为给定的一组排列相关蛋白质组装一组“最佳”元素(一种模式)。这些元件提供了一个家族的蛋白质和其他家族的代表或随机序列之间的区别。该方法结合了BLOCKS(自动生成蛋白质组的多个元素)、PROSITE(元素表示的简单性)和matrix /profiles(排列序列不同位置的氨基酸之间的不同区别)的优点。利用我们的方法,生成了蛋白质家族模式数据库PROF_PAT。该数据库基于SWISS-PROT bank release 24的27752个氨基酸序列。描述了743个相关蛋白组的模式特征。讨论了PROF_PAT模式与SWISS-PROT库蛋白的比较结果。
{"title":"A bank of protein family patterns for rapid identification of possible functions of amino acid sequences.","authors":"A G Bachinsky, A A Yarigin, E H Guseva, V A Kulichkov, L P Nizolenko","doi":"10.1093/bioinformatics/13.2.115","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.115","url":null,"abstract":"<p><p>A method and software tool to develop patterns of protein families has been designed. These patterns are intended for the identification of local similarities in arbitrary amino acid sequences with proteins of the SWISS-PROT bank. The method is based on the physical, chemical and structural properties of amino acids. It assembles a 'best set' of elements (a pattern) for a given group of aligned related proteins. These elements provide discrimination between proteins of a family and representatives of other families or random sequences. The method combines the advantages of BLOCKS (automatic generation of multiple elements for protein groups), PROSITE (simplicity of element presentation) and matrices/profiles (different distinctions between amino acids for different positions of aligned sequences). Using our method, a data bank of protein family patterns, PROF_PAT, is produced. This data bank is based on the 27,752 amino acid sequences of SWISS-PROT bank release 24. The characteristics of patterns of 743 related protein groups are described. The results of comparisons of PROF_PAT patterns with the proteins of the SWISS-PROT bank are discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"115-22"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.175
T M Smith, C Abajian, L Hood
Motivation: Genome-scale DNA sequencing is a multistep process in which large numbers of small template clones are propagated, purified, sequenced and analyzed on acrylamide gels. A significant challenge to these projects is the scale at which the data handling must be done. Hence, large-scale sequencing facilities will benefit from tracking template DNA information (purification methods, reaction and electrophoresis conditions) in a systematic fashion. A lack of software tools that support automated sample entry, and automatic data storage, retrieval and analysis are a major hindrance to recording and using laboratory workflow information to monitor the overall quality of data production.
Results: The UNIX file system has been used to prototype automation of the flow of data from the ABI sequencer to a data repository. Data are automatically processed by a central Perl program, Hopper, which runs a series of programs that analyze data quality (read length estimate, fraction of indeterminate bases, and number of contaminating and repetitive sequences), assemble shotgun sequence data, and generates simple reports describing the results.
{"title":"Hopper: software for automating data tracking and flow in DNA sequencing.","authors":"T M Smith, C Abajian, L Hood","doi":"10.1093/bioinformatics/13.2.175","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.175","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-scale DNA sequencing is a multistep process in which large numbers of small template clones are propagated, purified, sequenced and analyzed on acrylamide gels. A significant challenge to these projects is the scale at which the data handling must be done. Hence, large-scale sequencing facilities will benefit from tracking template DNA information (purification methods, reaction and electrophoresis conditions) in a systematic fashion. A lack of software tools that support automated sample entry, and automatic data storage, retrieval and analysis are a major hindrance to recording and using laboratory workflow information to monitor the overall quality of data production.</p><p><strong>Results: </strong>The UNIX file system has been used to prototype automation of the flow of data from the ABI sequencer to a data repository. Data are automatically processed by a central Perl program, Hopper, which runs a series of programs that analyze data quality (read length estimate, fraction of indeterminate bases, and number of contaminating and repetitive sequences), assemble shotgun sequence data, and generates simple reports describing the results.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"175-82"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.137
E Glémet, J J Codani
Motivation: This paper presents LASSAP, a new software package for sequence comparison. LASSAP is a programmable, high-performance system designed to raise current limitations of sequence comparison programs in order to fit the needs of large-scale analysis. LASSAP provides an API (Application Programming Interface) allowing the integration of any generic pairwise-based algorithm.
Results: Whatever pairwise algorithm is used in LASSAP, it shares with all other algorithms numerous enhancements such as: (i) intra- and inter-databank comparisons; (ii) computational requests (selections and computations are achieved on the fly); (iii) frame translations on queries and databanks; (iv) structured results allowing easy and powerful post-analysis; (v) performance improvements by parallelization and the driving of specialized hardware. LASSAP currently implements all major sequence comparison algorithms (Fasta, Blast, Smith/Waterman), and other string matching and pattern matching algorithms. LASSAP is both an integrated software for end-users and a framework allowing the integration and the combination of new algorithms. LASSAP is used in different projects such as the building of PRODOM, the exhaustive comparison of yeast sequences, and the subfragments matching problem of TREMBL.
{"title":"LASSAP, a LArge Scale Sequence compArison Package.","authors":"E Glémet, J J Codani","doi":"10.1093/bioinformatics/13.2.137","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.137","url":null,"abstract":"<p><strong>Motivation: </strong>This paper presents LASSAP, a new software package for sequence comparison. LASSAP is a programmable, high-performance system designed to raise current limitations of sequence comparison programs in order to fit the needs of large-scale analysis. LASSAP provides an API (Application Programming Interface) allowing the integration of any generic pairwise-based algorithm.</p><p><strong>Results: </strong>Whatever pairwise algorithm is used in LASSAP, it shares with all other algorithms numerous enhancements such as: (i) intra- and inter-databank comparisons; (ii) computational requests (selections and computations are achieved on the fly); (iii) frame translations on queries and databanks; (iv) structured results allowing easy and powerful post-analysis; (v) performance improvements by parallelization and the driving of specialized hardware. LASSAP currently implements all major sequence comparison algorithms (Fasta, Blast, Smith/Waterman), and other string matching and pattern matching algorithms. LASSAP is both an integrated software for end-users and a framework allowing the integration and the combination of new algorithms. LASSAP is used in different projects such as the building of PRODOM, the exhaustive comparison of yeast sequences, and the subfragments matching problem of TREMBL.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"137-43"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.137","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.151
J Blazewicz, J Kaczmarek, M Kasprzak, W T Markiewicz, J Weglarz
Motivation: Reconstruction of the original DNA sequence in the sequencing by the hybridization approach (SBH) requires computational support due to a large number of possible combinations. One can notice a lack of algorithms admitting false-negative data and giving in addition all possible solutions.
Results: In this paper, a new method of sequencing has been proposed. An algorithm based on its idea (for the general case, when some data are missing, like in the real experiment) has been implemented and tested. Authentic DNA sequences have been used for testing. A parallel version of the algorithm has also been implemented and tested. The quality of the reconstruction is satisfactory for the library of oligonucleotides of length between 8 and 12, and 100, 200 and 300 bp long sequences. A way to a further decrease in the computation time is also suggested.
{"title":"Sequential and parallel algorithms for DNA sequencing.","authors":"J Blazewicz, J Kaczmarek, M Kasprzak, W T Markiewicz, J Weglarz","doi":"10.1093/bioinformatics/13.2.151","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.151","url":null,"abstract":"<p><strong>Motivation: </strong>Reconstruction of the original DNA sequence in the sequencing by the hybridization approach (SBH) requires computational support due to a large number of possible combinations. One can notice a lack of algorithms admitting false-negative data and giving in addition all possible solutions.</p><p><strong>Results: </strong>In this paper, a new method of sequencing has been proposed. An algorithm based on its idea (for the general case, when some data are missing, like in the real experiment) has been implemented and tested. Authentic DNA sequences have been used for testing. A parallel version of the algorithm has also been implemented and tested. The quality of the reconstruction is satisfactory for the library of oligonucleotides of length between 8 and 12, and 100, 200 and 300 bp long sequences. A way to a further decrease in the computation time is also suggested.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"151-8"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.151","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}