首页 > 最新文献

Computer applications in the biosciences : CABIOS最新文献

英文 中文
Computer program for the equations describing the steady state of enzyme reactions. 描述酶反应稳态方程的计算机程序。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.159
R Varon, F Garcia-Sevilla, M Garcia-Moreno, F Garcia-Canovas, R Peyro, R G Duggleby

Motivation: The derivation of steady-state equations is frequently carried out in enzyme kinetic studies. Done manually, this becomes tedious and prone to human error. The computer programs now available which are able to accept reaction mechanisms of some complexity are focused only on the strict steady-state approach.

Results: Here we present a computer program called REFERASS, with a short computation time and a user-friendly format for the input and output files, able to derive the strict steady-state equations and/or those corresponding to the usual assumption that one ore more of the reversible steps are in rapid equilibrium. This program handles enzyme-catalysed reactions with mechanisms involving up to 255 enzyme species connected by up to 255 reaction steps, subject to limits imposed by the memory and disk space available.

动机:在酶动力学研究中,经常需要推导稳态方程。如果手工完成,这将变得乏味且容易出现人为错误。目前可用的能够接受某种复杂反应机制的计算机程序只集中在严格的稳态方法上。结果:在这里,我们提出了一个称为REFERASS的计算机程序,具有较短的计算时间和用户友好的输入和输出文件格式,能够推导出严格的稳态方程和/或与通常假设相对应的方程,即一个或多个可逆步骤处于快速平衡状态。该程序处理酶催化反应的机制涉及多达255种酶,通过多达255个反应步骤连接,受限于可用的内存和磁盘空间。
{"title":"Computer program for the equations describing the steady state of enzyme reactions.","authors":"R Varon,&nbsp;F Garcia-Sevilla,&nbsp;M Garcia-Moreno,&nbsp;F Garcia-Canovas,&nbsp;R Peyro,&nbsp;R G Duggleby","doi":"10.1093/bioinformatics/13.2.159","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.159","url":null,"abstract":"<p><strong>Motivation: </strong>The derivation of steady-state equations is frequently carried out in enzyme kinetic studies. Done manually, this becomes tedious and prone to human error. The computer programs now available which are able to accept reaction mechanisms of some complexity are focused only on the strict steady-state approach.</p><p><strong>Results: </strong>Here we present a computer program called REFERASS, with a short computation time and a user-friendly format for the input and output files, able to derive the strict steady-state equations and/or those corresponding to the usual assumption that one ore more of the reversible steps are in rapid equilibrium. This program handles enzyme-catalysed reactions with mechanisms involving up to 255 enzyme species connected by up to 255 reaction steps, subject to limits imposed by the memory and disk space available.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Search the Human cDNA Database at TIGR. 在TIGR检索人类cDNA数据库。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.201
L Cocea

Motivation: The Human cDNA Database (HCD) at the Institute for Genomic Research (TIGR) is the most complete, non-redundant and structured collection of human expressed DNA sequences available to date. Sequences and other data can be retrieved by users having opened an account at HCD/TIGR. A HCD search involves composing and sending queries one by one and this can become time consuming if many queries must be sent. Moreover, a large amount of time is required thereafter to process the results.

Results: The HCDSearch system described here automatically composes and sends the queries by e-mail using information provided in a text file; it also greatly accelerates the processing of results, generating lists of HCD numbers and library identifiers in a format that renders them very easy to examine. The programs run on Unix platforms.

动机:基因组研究所(TIGR)的人类cDNA数据库(HCD)是迄今为止最完整、无冗余和结构化的人类表达DNA序列集合。在HCD/TIGR开设账户的用户可以检索序列和其他数据。HCD搜索涉及一个接一个地编写和发送查询,如果必须发送许多查询,这可能会非常耗时。此外,之后需要大量的时间来处理结果。结果:这里描述的HCDSearch系统使用文本文件中提供的信息通过电子邮件自动编写和发送查询;它还极大地加速了结果的处理,以一种非常易于检查的格式生成HCD编号和库标识符列表。这些程序运行在Unix平台上。
{"title":"Search the Human cDNA Database at TIGR.","authors":"L Cocea","doi":"10.1093/bioinformatics/13.2.201","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.201","url":null,"abstract":"<p><strong>Motivation: </strong>The Human cDNA Database (HCD) at the Institute for Genomic Research (TIGR) is the most complete, non-redundant and structured collection of human expressed DNA sequences available to date. Sequences and other data can be retrieved by users having opened an account at HCD/TIGR. A HCD search involves composing and sending queries one by one and this can become time consuming if many queries must be sent. Moreover, a large amount of time is required thereafter to process the results.</p><p><strong>Results: </strong>The HCDSearch system described here automatically composes and sends the queries by e-mail using information provided in a text file; it also greatly accelerates the processing of results, generating lists of HCD numbers and library identifiers in a format that renders them very easy to examine. The programs run on Unix platforms.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.201","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of a deductive database system to search for topological and similar three-dimensional structures in protein. 应用演绎数据库系统搜索拓扑和相似的三维结构的蛋白质。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.183
Y Tsukamoto, K Takiguchi, K Satou, E Furuichi, T Takagi, S Kuhara

A deductive database system PACADE (Protein Atomic Coordinate Analyzer with Deductive Engine) has been developed for protein structure analysis. With this system, super-secondary structures described in logical and declarative rules can be retrieved effectively. For protein structure analysis, comparison of local structures in different proteins is a necessary mean. A function to search for similar structures has, therefore, been added to the PACADE system. We describe herein the result of searches for the same topological structures and three-dimensionally similar ones. A user of PACADE can select these two levels of similarity by changing parameters. This function enables the inference system to retrieve similar structures, according to the restraints of variables defined by the user. Similar super-secondary structures among proteins can be searched for automatically, which is useful for protein structure analysis. The retrieved similar super-secondary structures can serve as criteria for protein spatial alignment.

开发了一个用于蛋白质结构分析的演绎数据库系统PACADE (Protein Atomic Coordinate Analyzer with deduction Engine)。利用该系统,可以有效地检索用逻辑和声明性规则描述的超二级结构。在蛋白质结构分析中,比较不同蛋白质的局部结构是一种必要的手段。因此,在PACADE系统中增加了搜索类似结构的功能。本文描述了对相同拓扑结构和三维相似拓扑结构的搜索结果。PACADE的用户可以通过改变参数来选择这两种相似度。该功能使推理系统能够根据用户定义的变量约束检索相似的结构。该方法可以自动搜索蛋白质间相似的超二级结构,为蛋白质结构分析提供参考。检索到的相似超二级结构可以作为蛋白质空间定位的标准。
{"title":"Application of a deductive database system to search for topological and similar three-dimensional structures in protein.","authors":"Y Tsukamoto,&nbsp;K Takiguchi,&nbsp;K Satou,&nbsp;E Furuichi,&nbsp;T Takagi,&nbsp;S Kuhara","doi":"10.1093/bioinformatics/13.2.183","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.183","url":null,"abstract":"<p><p>A deductive database system PACADE (Protein Atomic Coordinate Analyzer with Deductive Engine) has been developed for protein structure analysis. With this system, super-secondary structures described in logical and declarative rules can be retrieved effectively. For protein structure analysis, comparison of local structures in different proteins is a necessary mean. A function to search for similar structures has, therefore, been added to the PACADE system. We describe herein the result of searches for the same topological structures and three-dimensionally similar ones. A user of PACADE can select these two levels of similarity by changing parameters. This function enables the inference system to retrieve similar structures, according to the restraints of variables defined by the user. Similar super-secondary structures among proteins can be searched for automatically, which is useful for protein structure analysis. The retrieved similar super-secondary structures can serve as criteria for protein spatial alignment.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.183","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Scoring hidden Markov models. 对隐马尔可夫模型进行评分。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.191
C Barrett, R Hughey, K Karplus

Motivation: Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.

Results: This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.

动机:统计序列比较技术,如隐马尔可夫模型和广义轮廓,计算由给定模型生成序列的概率。对数赔率评分是一种通过将其与零假设进行比较来评估这种概率的方法,零假设通常是一种更简单的统计模型,用于表示整个序列的范围,而不是感兴趣的组。这样的评分会导致两个直接的问题:零模型应该是什么,以及对数赔率得分的阈值应该被视为与模型匹配。结果:本文对这两个问题进行了实验分析。在序列比对和建模软件套件(SAM)的上下文中,我们考虑了各种null模型和合适的阈值。此外,我们考虑了HMMer的对数赔率评分和SAM的原始z评分方法。在零模型选择中,一个简单的循环零模型根据隐马尔可夫模型(HMM)建模的列中字符概率的几何平均值发出字符,在所有四个识别实验中表现良好或最好。
{"title":"Scoring hidden Markov models.","authors":"C Barrett,&nbsp;R Hughey,&nbsp;K Karplus","doi":"10.1093/bioinformatics/13.2.191","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.191","url":null,"abstract":"<p><strong>Motivation: </strong>Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.</p><p><strong>Results: </strong>This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Elasticities in Metabolic Control Analysis: algebraic derivation of simplified expressions. 代谢控制分析中的弹性:简化表达式的代数推导。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.123
J H Woods, H M Sauro

Motivation: Metabolic Control Analysis is one of many disciplines that make use of scaled derivatives. In particular, 'elasticities' are used to quantify the effect of an effector or substrate concentration on an enzyme rate under locally specified conditions. Normally an algebraic expression for the elasticity of an enzyme is obtained by differentiating its rate law, multiplying by the effector concentration and dividing by the rate law itself: this results in considerable expression expansion, and when the results are subsequently simplified it is often at the expense of biological comprehensibility.

Results: We present a novel algorithm which not only circumvents the expression expansion, but preserves an elegant separation of the components in enzyme behaviour. Easily implemented, and producing gains in both performance and numerical precision, the algorithm is potentially applicable to a number of existing packages. It also greatly assists the manual derivation and evaluation of elasticities, allowing the elasticity of even quite complex enzyme systems to be written by inspection.

动机:代谢控制分析是许多学科之一,利用缩放衍生物。特别是,“弹性”用于量化在局部特定条件下效应物或底物浓度对酶率的影响。通常,一种酶的弹性的代数表达式是通过微分它的速率定律,乘以效应浓度,除以速率定律本身得到的:这导致了相当大的表达式扩展,当结果随后被简化时,往往是以牺牲生物学的可理解性为代价的。结果:我们提出了一种新的算法,它不仅绕过了表达扩展,而且保留了酶行为中组分的优雅分离。该算法易于实现,并且在性能和数值精度方面都有所提高,可能适用于许多现有的软件包。它还极大地帮助了弹性的手工推导和评估,甚至允许相当复杂的酶系统的弹性通过检查来编写。
{"title":"Elasticities in Metabolic Control Analysis: algebraic derivation of simplified expressions.","authors":"J H Woods,&nbsp;H M Sauro","doi":"10.1093/bioinformatics/13.2.123","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.123","url":null,"abstract":"<p><strong>Motivation: </strong>Metabolic Control Analysis is one of many disciplines that make use of scaled derivatives. In particular, 'elasticities' are used to quantify the effect of an effector or substrate concentration on an enzyme rate under locally specified conditions. Normally an algebraic expression for the elasticity of an enzyme is obtained by differentiating its rate law, multiplying by the effector concentration and dividing by the rate law itself: this results in considerable expression expansion, and when the results are subsequently simplified it is often at the expense of biological comprehensibility.</p><p><strong>Results: </strong>We present a novel algorithm which not only circumvents the expression expansion, but preserves an elegant separation of the components in enzyme behaviour. Easily implemented, and producing gains in both performance and numerical precision, the algorithm is potentially applicable to a number of existing packages. It also greatly assists the manual derivation and evaluation of elasticities, allowing the elasticity of even quite complex enzyme systems to be written by inspection.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.123","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A bank of protein family patterns for rapid identification of possible functions of amino acid sequences. 用于快速识别氨基酸序列可能功能的蛋白质家族模式库。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.115
A G Bachinsky, A A Yarigin, E H Guseva, V A Kulichkov, L P Nizolenko

A method and software tool to develop patterns of protein families has been designed. These patterns are intended for the identification of local similarities in arbitrary amino acid sequences with proteins of the SWISS-PROT bank. The method is based on the physical, chemical and structural properties of amino acids. It assembles a 'best set' of elements (a pattern) for a given group of aligned related proteins. These elements provide discrimination between proteins of a family and representatives of other families or random sequences. The method combines the advantages of BLOCKS (automatic generation of multiple elements for protein groups), PROSITE (simplicity of element presentation) and matrices/profiles (different distinctions between amino acids for different positions of aligned sequences). Using our method, a data bank of protein family patterns, PROF_PAT, is produced. This data bank is based on the 27,752 amino acid sequences of SWISS-PROT bank release 24. The characteristics of patterns of 743 related protein groups are described. The results of comparisons of PROF_PAT patterns with the proteins of the SWISS-PROT bank are discussed.

设计了一种开发蛋白质家族图谱的方法和软件工具。这些模式旨在鉴定任意氨基酸序列与SWISS-PROT蛋白库的局部相似性。该方法是基于氨基酸的物理、化学和结构性质。它为给定的一组排列相关蛋白质组装一组“最佳”元素(一种模式)。这些元件提供了一个家族的蛋白质和其他家族的代表或随机序列之间的区别。该方法结合了BLOCKS(自动生成蛋白质组的多个元素)、PROSITE(元素表示的简单性)和matrix /profiles(排列序列不同位置的氨基酸之间的不同区别)的优点。利用我们的方法,生成了蛋白质家族模式数据库PROF_PAT。该数据库基于SWISS-PROT bank release 24的27752个氨基酸序列。描述了743个相关蛋白组的模式特征。讨论了PROF_PAT模式与SWISS-PROT库蛋白的比较结果。
{"title":"A bank of protein family patterns for rapid identification of possible functions of amino acid sequences.","authors":"A G Bachinsky,&nbsp;A A Yarigin,&nbsp;E H Guseva,&nbsp;V A Kulichkov,&nbsp;L P Nizolenko","doi":"10.1093/bioinformatics/13.2.115","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.115","url":null,"abstract":"<p><p>A method and software tool to develop patterns of protein families has been designed. These patterns are intended for the identification of local similarities in arbitrary amino acid sequences with proteins of the SWISS-PROT bank. The method is based on the physical, chemical and structural properties of amino acids. It assembles a 'best set' of elements (a pattern) for a given group of aligned related proteins. These elements provide discrimination between proteins of a family and representatives of other families or random sequences. The method combines the advantages of BLOCKS (automatic generation of multiple elements for protein groups), PROSITE (simplicity of element presentation) and matrices/profiles (different distinctions between amino acids for different positions of aligned sequences). Using our method, a data bank of protein family patterns, PROF_PAT, is produced. This data bank is based on the 27,752 amino acid sequences of SWISS-PROT bank release 24. The characteristics of patterns of 743 related protein groups are described. The results of comparisons of PROF_PAT patterns with the proteins of the SWISS-PROT bank are discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Hopper: software for automating data tracking and flow in DNA sequencing. Hopper:用于自动数据跟踪和DNA测序流程的软件。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.175
T M Smith, C Abajian, L Hood

Motivation: Genome-scale DNA sequencing is a multistep process in which large numbers of small template clones are propagated, purified, sequenced and analyzed on acrylamide gels. A significant challenge to these projects is the scale at which the data handling must be done. Hence, large-scale sequencing facilities will benefit from tracking template DNA information (purification methods, reaction and electrophoresis conditions) in a systematic fashion. A lack of software tools that support automated sample entry, and automatic data storage, retrieval and analysis are a major hindrance to recording and using laboratory workflow information to monitor the overall quality of data production.

Results: The UNIX file system has been used to prototype automation of the flow of data from the ABI sequencer to a data repository. Data are automatically processed by a central Perl program, Hopper, which runs a series of programs that analyze data quality (read length estimate, fraction of indeterminate bases, and number of contaminating and repetitive sequences), assemble shotgun sequence data, and generates simple reports describing the results.

动机:基因组规模的DNA测序是一个多步骤的过程,在这个过程中,大量的小模板克隆在丙烯酰胺凝胶上繁殖、纯化、测序和分析。这些项目面临的一个重大挑战是必须完成数据处理的规模。因此,大规模测序设施将受益于以系统的方式跟踪模板DNA信息(纯化方法,反应和电泳条件)。缺乏支持自动样品输入和自动数据存储、检索和分析的软件工具是记录和使用实验室工作流程信息来监控数据生产整体质量的主要障碍。结果:UNIX文件系统已被用于实现从ABI序列器到数据存储库的数据流的自动化原型。数据由中央Perl程序Hopper自动处理,该程序运行一系列程序,分析数据质量(读取长度估计、不确定碱基的比例以及污染和重复序列的数量),组装散弹枪序列数据,并生成描述结果的简单报告。
{"title":"Hopper: software for automating data tracking and flow in DNA sequencing.","authors":"T M Smith,&nbsp;C Abajian,&nbsp;L Hood","doi":"10.1093/bioinformatics/13.2.175","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.175","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-scale DNA sequencing is a multistep process in which large numbers of small template clones are propagated, purified, sequenced and analyzed on acrylamide gels. A significant challenge to these projects is the scale at which the data handling must be done. Hence, large-scale sequencing facilities will benefit from tracking template DNA information (purification methods, reaction and electrophoresis conditions) in a systematic fashion. A lack of software tools that support automated sample entry, and automatic data storage, retrieval and analysis are a major hindrance to recording and using laboratory workflow information to monitor the overall quality of data production.</p><p><strong>Results: </strong>The UNIX file system has been used to prototype automation of the flow of data from the ABI sequencer to a data repository. Data are automatically processed by a central Perl program, Hopper, which runs a series of programs that analyze data quality (read length estimate, fraction of indeterminate bases, and number of contaminating and repetitive sequences), assemble shotgun sequence data, and generates simple reports describing the results.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20095075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
LASSAP, a LArge Scale Sequence compArison Package. LASSAP,一个大规模序列比较包。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.137
E Glémet, J J Codani

Motivation: This paper presents LASSAP, a new software package for sequence comparison. LASSAP is a programmable, high-performance system designed to raise current limitations of sequence comparison programs in order to fit the needs of large-scale analysis. LASSAP provides an API (Application Programming Interface) allowing the integration of any generic pairwise-based algorithm.

Results: Whatever pairwise algorithm is used in LASSAP, it shares with all other algorithms numerous enhancements such as: (i) intra- and inter-databank comparisons; (ii) computational requests (selections and computations are achieved on the fly); (iii) frame translations on queries and databanks; (iv) structured results allowing easy and powerful post-analysis; (v) performance improvements by parallelization and the driving of specialized hardware. LASSAP currently implements all major sequence comparison algorithms (Fasta, Blast, Smith/Waterman), and other string matching and pattern matching algorithms. LASSAP is both an integrated software for end-users and a framework allowing the integration and the combination of new algorithms. LASSAP is used in different projects such as the building of PRODOM, the exhaustive comparison of yeast sequences, and the subfragments matching problem of TREMBL.

动机:本文介绍了一种新的序列比较软件包LASSAP。LASSAP是一种可编程的高性能系统,旨在提高当前序列比较程序的局限性,以适应大规模分析的需要。lasassap提供了一个API(应用程序编程接口),允许集成任何通用的基于对的算法。结果:无论在lasassap中使用哪种配对算法,它都与所有其他算法共享许多增强功能,例如:(i)数据库内部和数据库之间的比较;(ii)计算请求(动态完成选择和计算);(iii)查询和数据库的框架翻译;(iv)结构化结果,便于进行强大的后期分析;(v)通过并行化和专用硬件驱动来提高性能。LASSAP目前实现了所有主要的序列比较算法(Fasta, Blast, Smith/Waterman),以及其他字符串匹配和模式匹配算法。lasassap既是一个面向最终用户的集成软件,也是一个允许集成和组合新算法的框架。LASSAP应用于PRODOM的构建、酵母序列的详尽比较、TREMBL的子片段匹配问题等多个项目中。
{"title":"LASSAP, a LArge Scale Sequence compArison Package.","authors":"E Glémet,&nbsp;J J Codani","doi":"10.1093/bioinformatics/13.2.137","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.137","url":null,"abstract":"<p><strong>Motivation: </strong>This paper presents LASSAP, a new software package for sequence comparison. LASSAP is a programmable, high-performance system designed to raise current limitations of sequence comparison programs in order to fit the needs of large-scale analysis. LASSAP provides an API (Application Programming Interface) allowing the integration of any generic pairwise-based algorithm.</p><p><strong>Results: </strong>Whatever pairwise algorithm is used in LASSAP, it shares with all other algorithms numerous enhancements such as: (i) intra- and inter-databank comparisons; (ii) computational requests (selections and computations are achieved on the fly); (iii) frame translations on queries and databanks; (iv) structured results allowing easy and powerful post-analysis; (v) performance improvements by parallelization and the driving of specialized hardware. LASSAP currently implements all major sequence comparison algorithms (Fasta, Blast, Smith/Waterman), and other string matching and pattern matching algorithms. LASSAP is both an integrated software for end-users and a framework allowing the integration and the combination of new algorithms. LASSAP is used in different projects such as the building of PRODOM, the exhaustive comparison of yeast sequences, and the subfragments matching problem of TREMBL.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.137","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Sequential and parallel algorithms for DNA sequencing. DNA测序的顺序和并行算法。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.151
J Blazewicz, J Kaczmarek, M Kasprzak, W T Markiewicz, J Weglarz

Motivation: Reconstruction of the original DNA sequence in the sequencing by the hybridization approach (SBH) requires computational support due to a large number of possible combinations. One can notice a lack of algorithms admitting false-negative data and giving in addition all possible solutions.

Results: In this paper, a new method of sequencing has been proposed. An algorithm based on its idea (for the general case, when some data are missing, like in the real experiment) has been implemented and tested. Authentic DNA sequences have been used for testing. A parallel version of the algorithm has also been implemented and tested. The quality of the reconstruction is satisfactory for the library of oligonucleotides of length between 8 and 12, and 100, 200 and 300 bp long sequences. A way to a further decrease in the computation time is also suggested.

动机:利用杂交法(SBH)重建原始DNA序列需要计算支持,因为有大量可能的组合。人们可以注意到,缺乏承认假阴性数据并给出所有可能解决方案的算法。结果:本文提出了一种新的测序方法。基于其思想的算法(对于一般情况,当一些数据丢失时,就像在真实的实验中一样)已经实现和测试。真实的DNA序列已被用于测试。该算法的并行版本也已实现和测试。对于长度在8 ~ 12 bp之间,长度在100、200和300 bp之间的序列,重建的质量令人满意。本文还提出了一种进一步减少计算时间的方法。
{"title":"Sequential and parallel algorithms for DNA sequencing.","authors":"J Blazewicz,&nbsp;J Kaczmarek,&nbsp;M Kasprzak,&nbsp;W T Markiewicz,&nbsp;J Weglarz","doi":"10.1093/bioinformatics/13.2.151","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.151","url":null,"abstract":"<p><strong>Motivation: </strong>Reconstruction of the original DNA sequence in the sequencing by the hybridization approach (SBH) requires computational support due to a large number of possible combinations. One can notice a lack of algorithms admitting false-negative data and giving in addition all possible solutions.</p><p><strong>Results: </strong>In this paper, a new method of sequencing has been proposed. An algorithm based on its idea (for the general case, when some data are missing, like in the real experiment) has been implemented and tested. Authentic DNA sequences have been used for testing. A parallel version of the algorithm has also been implemented and tested. The quality of the reconstruction is satisfactory for the library of oligonucleotides of length between 8 and 12, and 100, 200 and 300 bp long sequences. A way to a further decrease in the computation time is also suggested.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.151","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Using video-oriented instructions to speed up sequence comparison. 使用面向视频的指令来加快序列比较。
Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.145
A Wozniak

Motivation: This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.

Results: Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.

动机:本文件提出了一个著名的史密斯-沃特曼算法的实现,用于比较蛋白质和核酸序列,使用专门的视频指令。这些指令在设计上类似simd,使算法在指令级上并行化成为可能。结果:运行在167 MHz的ULTRA SPARC上的基准测试显示,与在同一台机器上使用整数指令实现的相同算法相比,加速因子是两倍。在单个处理器上的性能达到每秒超过1800万个矩阵单元,据我们所知,这是Smith-Waterman算法在工作站上最快的实现。加速程序是在LASSAP中引入的,LASSAP是由INRIA开发的大规模序列比较软件包软件,用于处理更高级别的并行性。在具有12个处理器的SUN Enterprise 6000服务器上,获得了每秒近2亿个矩阵单元的速度。全长300个氨基酸的序列在29 s内被SWISSPROT R33(1,8531,385个残基)扫描。此程序并不局限于数据库扫描。它适用于LASSAP处理的所有情况(银行内部和银行间比较,z分数计算等)。
{"title":"Using video-oriented instructions to speed up sequence comparison.","authors":"A Wozniak","doi":"10.1093/bioinformatics/13.2.145","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.145","url":null,"abstract":"<p><strong>Motivation: </strong>This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.</p><p><strong>Results: </strong>Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 166
期刊
Computer applications in the biosciences : CABIOS
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1