首页 > 最新文献

Computer applications in the biosciences : CABIOS最新文献

英文 中文
ONIX: an interactive PC program for the examination of protein 3D structure from PDB. ONIX:一个交互式PC程序,用于检查PDB中的蛋白质3D结构。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.111
A S Ivanov, A B Rumjantsev, V S Skvortşov, A I Archakov
The examination of protein three-dimensional (3D) structure and ligand binding site are the key points in structure-based computer-aided drug design (Kuntz, 1992). The ligand binding site usually consists of some fragments of protein sequence. Its examination requires active participation both of human and computer in the process of investigation. This work is a result of our need for a Windows-based program for such investigations. ONIX is an interactive piece of software based on protein structure hierarchy. Analysis of the molecular surface makes it possible to find all elements of the ligand binding site. ONIX v.1.03 is free software and will be made available by both FTP and E-mail from the EMBL file servers.
{"title":"ONIX: an interactive PC program for the examination of protein 3D structure from PDB.","authors":"A S Ivanov, A B Rumjantsev, V S Skvortşov, A I Archakov","doi":"10.1093/bioinformatics/13.1.111","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.111","url":null,"abstract":"The examination of protein three-dimensional (3D) structure and ligand binding site are the key points in structure-based computer-aided drug design (Kuntz, 1992). The ligand binding site usually consists of some fragments of protein sequence. Its examination requires active participation both of human and computer in the process of investigation. This work is a result of our need for a Windows-based program for such investigations. ONIX is an interactive piece of software based on protein structure hierarchy. Analysis of the molecular surface makes it possible to find all elements of the ligand binding site. ONIX v.1.03 is free software and will be made available by both FTP and E-mail from the EMBL file servers.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Latent sequence periodicity of some oncogenes and DNA-binding protein genes. 某些癌基因和dna结合蛋白基因的潜在序列周期性。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.37
E V Korotkov, M A Korotkova, J S Tulko

A method of latent periodicity search is developed. We use mutual information to reveal the latent periodicity of mRNA sequences. The latent periodicity of an mRNA sequence is a periodicity with a low level of similarity between any two periods inside the mRNA sequence. The mutual information between an artificial numerical sequence and an mRNA sequence is calculated. The length of the artificial sequence period is varied from 2 to 150. The high level of the mutual information between artificial and mRNA sequences allows us to find any type of latent periodicity of mRNA sequence. The latent periodicity of many mRNA coding regions has been found. For example, the retinoblastoma gene of HSRBS clone contains a region with a latent period equal to 45 bases. The A-RAF oncogene of HSARAFIR clone contains a region with a latent period equal to 84 bases. Integrated sequences for the regions with latent periodicity are determined. The potential significance of latent periodicity is discussed.

提出了一种隐周期搜索方法。我们利用互信息来揭示mRNA序列的潜在周期性。mRNA序列的潜在周期性是指mRNA序列内任意两个周期之间具有低水平相似性的周期性。计算了人工数字序列与mRNA序列之间的互信息。人工序列周期的长度从2 ~ 150不等。人工序列和mRNA序列之间高度的互信息使我们能够发现任何类型的mRNA序列的潜在周期性。许多mRNA编码区的潜在周期性已被发现。例如,HSRBS克隆的视网膜母细胞瘤基因包含一个潜伏期为45个碱基的区域。HSARAFIR克隆的a - raf致癌基因包含一个潜伏期为84个碱基的区域。确定了隐周期区域的积分序列。讨论了潜在周期性的潜在意义。
{"title":"Latent sequence periodicity of some oncogenes and DNA-binding protein genes.","authors":"E V Korotkov,&nbsp;M A Korotkova,&nbsp;J S Tulko","doi":"10.1093/bioinformatics/13.1.37","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.37","url":null,"abstract":"<p><p>A method of latent periodicity search is developed. We use mutual information to reveal the latent periodicity of mRNA sequences. The latent periodicity of an mRNA sequence is a periodicity with a low level of similarity between any two periods inside the mRNA sequence. The mutual information between an artificial numerical sequence and an mRNA sequence is calculated. The length of the artificial sequence period is varied from 2 to 150. The high level of the mutual information between artificial and mRNA sequences allows us to find any type of latent periodicity of mRNA sequence. The latent periodicity of many mRNA coding regions has been found. For example, the retinoblastoma gene of HSRBS clone contains a region with a latent period equal to 45 bases. The A-RAF oncogene of HSARAFIR clone contains a region with a latent period equal to 84 bases. Integrated sequences for the regions with latent periodicity are determined. The potential significance of latent periodicity is discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.37","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Hexanucleotide frequency database. 六核苷酸频率数据库。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.107
W Bains
{"title":"Hexanucleotide frequency database.","authors":"W Bains","doi":"10.1093/bioinformatics/13.1.107","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.107","url":null,"abstract":"","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A tool for aligning very similar DNA sequences. 一种对非常相似的DNA序列进行比对的工具。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.75
K M Chao, J Zhang, J Ostell, W Miller

Results: We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.

结果:我们制作了一个名为sim3的计算机程序,它解决了以下计算问题。给出两个DNA序列,其中较短的序列与较长序列的某些相邻区域非常相似。Sim3确定较长序列的类似区域,然后计算一组最优的单核苷酸变化(即插入,删除或替换),将较短序列转换到该区域。因此,比对评分方案的设计是为了模拟测序错误,而不是进化过程。该程序可以在工作站上几秒钟内将100 kb序列对齐到1兆碱基序列,前提是较短序列与较长序列中的某些区域之间存在很小的差异。该程序已被用于为国家生物技术信息中心的基因组部收集序列数据。
{"title":"A tool for aligning very similar DNA sequences.","authors":"K M Chao,&nbsp;J Zhang,&nbsp;J Ostell,&nbsp;W Miller","doi":"10.1093/bioinformatics/13.1.75","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.75","url":null,"abstract":"<p><strong>Results: </strong>We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.75","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Post-processing of BLAST results using databases of clustered sequences. 聚类序列数据库对BLAST结果的后处理。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.81
G S Miller, R Fuchs

Motivation: When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.

Results: We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.

动机:在评估序列相似性搜索的结果时,在许多情况下,确定结果中出现的序列是否具有某些显着特征是有用的。数据库条目之间的这种依赖关系通常不容易识别,但可以对基因或蛋白质的生物学功能产生重要的新见解。结果:我们开发了一个名为CBLAST的程序,该程序根据用户定义的序列“簇”中的序列隶属度对BLAST序列相似性搜索结果进行排序。为了演示这个应用程序的实用性,我们构造了两个集群数据库。第一个描述的是UNIGENE数据库中记录的代表同一基因的核苷酸序列簇,第二个描述的是PROSITE数据库中记录的蛋白质家族成员的蛋白质序列簇。集群数据库和CBLAST后处理器为识别和探索新序列和数据库条目之间的关系和依赖提供了一种有效的机制。
{"title":"Post-processing of BLAST results using databases of clustered sequences.","authors":"G S Miller,&nbsp;R Fuchs","doi":"10.1093/bioinformatics/13.1.81","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.81","url":null,"abstract":"<p><strong>Motivation: </strong>When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.</p><p><strong>Results: </strong>We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.81","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm. 采用遗传算法改进了脊髓灰质炎病毒5型非编码区的二级结构计算机预测。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.1
K M Currey, B A Shapiro

Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.

遗传算法获得的脊髓灰质炎病毒3型RNA 5′非编码区二级结构与Skinner等人(J. Mol. Biol.)模型的比较。, 207,379 -392, 1989)证明了许多已确认的结构要素。遗传算法(夏皮罗和纳韦塔,J.)(8,195 - 201,1994)生成所有可能的茎的种群,然后在大规模并行计算机上多次迭代混合,组合和重新组合这些茎,最终根据其能量选择最适合的结构。使用遗传算法可以更好地预测包含神经毒性决定因素的区域的二级结构,而动态规划算法(Zuker, Science, 244, 48-52, 1989)需要系统发育比较序列分析才能得出正确的结论。此外,在基因组的这一区域引入了人工突变,尽管可能会发生结构重排,但许多结构仍然存在,这表明这样选择的给定结构可能已经进化到能够承受孤立的突变。遗传算法衍生的5'非编码区结构与先前描述的生物数据和功能相比更有利,并且包含所有的“持久”结构,这也表明持久性因素可能有助于验证结构。
{"title":"Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm.","authors":"K M Currey,&nbsp;B A Shapiro","doi":"10.1093/bioinformatics/13.1.1","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.1","url":null,"abstract":"<p><p>Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
ConsInspector 3.0: new library and enhanced functionality. ConsInspector 3.0:新的库和增强的功能。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.109
K Frech, P Dietze, T Werner
Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:
{"title":"ConsInspector 3.0: new library and enhanced functionality.","authors":"K Frech,&nbsp;P Dietze,&nbsp;T Werner","doi":"10.1093/bioinformatics/13.1.109","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.109","url":null,"abstract":"Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Reduced space sequence alignment. 减少空间序列对齐。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.45
J A Grice, R Hughey, D Speck
MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.
动机:序列对齐是在两个序列之间找到最佳的逐个字符对应的问题。在串行机器上,它可以在O(n2)时间和O(n2)空间中轻松解决,或者在并行机器上,在O(n)时间和O(n)空间中求解每O(n)个处理元素。Hirschberg寻找单个最佳路径的分而治之方法减少了n倍的空间使用,同时只对串行版本产生很小的恒定减速。结果:本文提出了一个家族的方法来计算序列对齐与减少内存,非常适合串行或并行实现。与分治方法不同,它们可以用于线性隐马尔可夫模型的前向后(Baum-Welch)训练,并且它们避免了依赖数据的重新划分,使它们更容易并行化。该算法的特点是,对于任意整数L,一个与L速度成正比的因子,以换取空间需求从O(n2)减少到O(n1根号n)。该算法族的单个最佳路径成员匹配分治算法的二次时间和线性空间。在实验中,该家族的O(n1.5)空间成员比O(n)空间分治算法快15-40%。
{"title":"Reduced space sequence alignment.","authors":"J A Grice,&nbsp;R Hughey,&nbsp;D Speck","doi":"10.1093/bioinformatics/13.1.45","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.45","url":null,"abstract":"MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.45","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
DROSOPOSON: a knowledge base on chromosomal localization of transposable element insertions in Drosophila. DROSOPOSON:果蝇转座因子插入染色体定位的知识库。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.61
C Hoogland, C Biémont

Motivation: What forces maintain transposable elements (TEs) in genomes and populations is one of the main questions to understand the dynamics of these elements, but the exact nature of these forces is still a matter of speculation. To test theoretical models of TE population dynamics, we need many data on the genomic distributions of various elements. These data are now accumulating for the species Drosophila melanogaster, but they are scattered in the literature.

Results: The knowledge base DROSOPOSON thus brings together: (1) data available on Drosophila chromosomal localizations of TE insertions and on features of the polytene chromosomes (DNA content, recombination rate, break-points, etc); (2) statistical methods aimed at analysing the distribution of the TE insertions along the chromosomes. In this paper, we present the structure of the base, the data and the statistical methods. Theoretical models of containment of TE copy number in Drosophila can thus be tested.

动机:在基因组和种群中维持转座因子(te)的力量是理解这些元素动态的主要问题之一,但这些力量的确切性质仍然是一个猜测问题。为了验证TE种群动态的理论模型,我们需要大量关于不同元素基因组分布的数据。这些关于黑腹果蝇的数据正在积累,但它们在文献中是分散的。结果:DROSOPOSON知识库汇集了:(1)果蝇TE插入染色体定位和多线染色体特征(DNA含量、重组率、断点等)的可用数据;(2)旨在分析TE插入沿染色体分布的统计方法。本文介绍了该数据库的结构、数据和统计方法。因此,果蝇TE拷贝数控制的理论模型可以得到检验。
{"title":"DROSOPOSON: a knowledge base on chromosomal localization of transposable element insertions in Drosophila.","authors":"C Hoogland,&nbsp;C Biémont","doi":"10.1093/bioinformatics/13.1.61","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.61","url":null,"abstract":"<p><strong>Motivation: </strong>What forces maintain transposable elements (TEs) in genomes and populations is one of the main questions to understand the dynamics of these elements, but the exact nature of these forces is still a matter of speculation. To test theoretical models of TE population dynamics, we need many data on the genomic distributions of various elements. These data are now accumulating for the species Drosophila melanogaster, but they are scattered in the literature.</p><p><strong>Results: </strong>The knowledge base DROSOPOSON thus brings together: (1) data available on Drosophila chromosomal localizations of TE insertions and on features of the polytene chromosomes (DNA content, recombination rate, break-points, etc); (2) statistical methods aimed at analysing the distribution of the TE insertions along the chromosomes. In this paper, we present the structure of the base, the data and the statistical methods. Theoretical models of containment of TE copy number in Drosophila can thus be tested.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.61","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Correlating patterns in alignments of polymorphic sequences with experimental assays. 与实验分析的多态序列比对的相关模式。
Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.13
G Chelvanayagam, S Easteal

A general algorithm is presented for identifying sets of positions in multiple sequence alignments that best characterize an a priori partitioning such as those determined by inhibition studies or other experimental techniques. The algorithm explores combinations of polymorphic columns in the alignment and evaluates how well these sites reflect the original input partition. Partitions across the polymorphic columns are derived using a tree building procedure with conventional amino acid substitution matrices. Elucidation of those amino acids which govern the biochemical behaviour of a protein with a given substrate or inhibitor can provide insights towards an understanding of the tertiary conformation of the protein. Since it is likely that such positions will be spatially clustered in the protein fold, these positions may give rise to useful distance constraints for substantiating model protein structures. The method is exemplified using data for a set of human mu class glutathione S-transferases. A novel aspect for predicting the behaviour of new polymorphic sequences is also discussed.

提出了一种通用算法,用于识别多序列比对中最能表征先验划分的位置集,如由抑制研究或其他实验技术确定的位置集。该算法探索排列中多态列的组合,并评估这些位置如何很好地反映原始输入分区。跨多态列的分区是使用传统的氨基酸替代矩阵的树构建程序派生的。阐明那些控制蛋白质与给定底物或抑制剂的生化行为的氨基酸可以为理解蛋白质的三级构象提供见解。由于这些位置很可能会在空间上聚集在蛋白质折叠中,这些位置可能会产生有用的距离限制,以证实模型蛋白质结构。该方法以一组人mu类谷胱甘肽s -转移酶的数据为例。还讨论了预测新多态性序列行为的一个新方面。
{"title":"Correlating patterns in alignments of polymorphic sequences with experimental assays.","authors":"G Chelvanayagam,&nbsp;S Easteal","doi":"10.1093/bioinformatics/13.1.13","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.13","url":null,"abstract":"<p><p>A general algorithm is presented for identifying sets of positions in multiple sequence alignments that best characterize an a priori partitioning such as those determined by inhibition studies or other experimental techniques. The algorithm explores combinations of polymorphic columns in the alignment and evaluates how well these sites reflect the original input partition. Partitions across the polymorphic columns are derived using a tree building procedure with conventional amino acid substitution matrices. Elucidation of those amino acids which govern the biochemical behaviour of a protein with a given substrate or inhibitor can provide insights towards an understanding of the tertiary conformation of the protein. Since it is likely that such positions will be spatially clustered in the protein fold, these positions may give rise to useful distance constraints for substantiating model protein structures. The method is exemplified using data for a set of human mu class glutathione S-transferases. A novel aspect for predicting the behaviour of new polymorphic sequences is also discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.13","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Computer applications in the biosciences : CABIOS
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1