Computer applications in the biosciences : CABIOS最新文献

英文中文

Using video-oriented instructions to speed up sequence comparison. 使用面向视频的指令来加快序列比较。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-04-01 DOI: 10.1093/bioinformatics/13.2.145

A Wozniak

Motivation: This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.

Results: Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.

动机:本文件提出了一个著名的史密斯-沃特曼算法的实现，用于比较蛋白质和核酸序列，使用专门的视频指令。这些指令在设计上类似simd，使算法在指令级上并行化成为可能。结果:运行在167 MHz的ULTRA SPARC上的基准测试显示，与在同一台机器上使用整数指令实现的相同算法相比，加速因子是两倍。在单个处理器上的性能达到每秒超过1800万个矩阵单元，据我们所知，这是Smith-Waterman算法在工作站上最快的实现。加速程序是在LASSAP中引入的，LASSAP是由INRIA开发的大规模序列比较软件包软件，用于处理更高级别的并行性。在具有12个处理器的SUN Enterprise 6000服务器上，获得了每秒近2亿个矩阵单元的速度。全长300个氨基酸的序列在29 s内被SWISSPROT R33(1,8531,385个残基)扫描。此程序并不局限于数据库扫描。它适用于LASSAP处理的所有情况(银行内部和银行间比较，z分数计算等)。

{"title":"Using video-oriented instructions to speed up sequence comparison.","authors":"A Wozniak","doi":"10.1093/bioinformatics/13.2.145","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.145","url":null,"abstract":"Motivation: This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.Results: Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"145-50"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 166

ONIX: an interactive PC program for the examination of protein 3D structure from PDB. ONIX:一个交互式PC程序，用于检查PDB中的蛋白质3D结构。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.111

A S Ivanov, A B Rumjantsev, V S Skvortşov, A I Archakov

The examination of protein three-dimensional (3D) structure and ligand binding site are the key points in structure-based computer-aided drug design (Kuntz, 1992). The ligand binding site usually consists of some fragments of protein sequence. Its examination requires active participation both of human and computer in the process of investigation. This work is a result of our need for a Windows-based program for such investigations. ONIX is an interactive piece of software based on protein structure hierarchy. Analysis of the molecular surface makes it possible to find all elements of the ligand binding site. ONIX v.1.03 is free software and will be made available by both FTP and E-mail from the EMBL file servers.

引用次数: 2

Latent sequence periodicity of some oncogenes and DNA-binding protein genes. 某些癌基因和dna结合蛋白基因的潜在序列周期性。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.37

E V Korotkov, M A Korotkova, J S Tulko

A method of latent periodicity search is developed. We use mutual information to reveal the latent periodicity of mRNA sequences. The latent periodicity of an mRNA sequence is a periodicity with a low level of similarity between any two periods inside the mRNA sequence. The mutual information between an artificial numerical sequence and an mRNA sequence is calculated. The length of the artificial sequence period is varied from 2 to 150. The high level of the mutual information between artificial and mRNA sequences allows us to find any type of latent periodicity of mRNA sequence. The latent periodicity of many mRNA coding regions has been found. For example, the retinoblastoma gene of HSRBS clone contains a region with a latent period equal to 45 bases. The A-RAF oncogene of HSARAFIR clone contains a region with a latent period equal to 84 bases. Integrated sequences for the regions with latent periodicity are determined. The potential significance of latent periodicity is discussed.

提出了一种隐周期搜索方法。我们利用互信息来揭示mRNA序列的潜在周期性。mRNA序列的潜在周期性是指mRNA序列内任意两个周期之间具有低水平相似性的周期性。计算了人工数字序列与mRNA序列之间的互信息。人工序列周期的长度从2 ~ 150不等。人工序列和mRNA序列之间高度的互信息使我们能够发现任何类型的mRNA序列的潜在周期性。许多mRNA编码区的潜在周期性已被发现。例如，HSRBS克隆的视网膜母细胞瘤基因包含一个潜伏期为45个碱基的区域。HSARAFIR克隆的a - raf致癌基因包含一个潜伏期为84个碱基的区域。确定了隐周期区域的积分序列。讨论了潜在周期性的潜在意义。

引用次数: 28

Hexanucleotide frequency database. 六核苷酸频率数据库。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.107

W Bains

引用次数: 2

A tool for aligning very similar DNA sequences. 一种对非常相似的DNA序列进行比对的工具。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.75

K M Chao, J Zhang, J Ostell, W Miller

Results: We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.

结果:我们制作了一个名为sim3的计算机程序，它解决了以下计算问题。给出两个DNA序列，其中较短的序列与较长序列的某些相邻区域非常相似。Sim3确定较长序列的类似区域，然后计算一组最优的单核苷酸变化(即插入，删除或替换)，将较短序列转换到该区域。因此，比对评分方案的设计是为了模拟测序错误，而不是进化过程。该程序可以在工作站上几秒钟内将100 kb序列对齐到1兆碱基序列，前提是较短序列与较长序列中的某些区域之间存在很小的差异。该程序已被用于为国家生物技术信息中心的基因组部收集序列数据。

引用次数: 29

Post-processing of BLAST results using databases of clustered sequences. 聚类序列数据库对BLAST结果的后处理。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.81

G S Miller, R Fuchs

Motivation: When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.

Results: We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.

动机:在评估序列相似性搜索的结果时，在许多情况下，确定结果中出现的序列是否具有某些显着特征是有用的。数据库条目之间的这种依赖关系通常不容易识别，但可以对基因或蛋白质的生物学功能产生重要的新见解。结果:我们开发了一个名为CBLAST的程序，该程序根据用户定义的序列“簇”中的序列隶属度对BLAST序列相似性搜索结果进行排序。为了演示这个应用程序的实用性，我们构造了两个集群数据库。第一个描述的是UNIGENE数据库中记录的代表同一基因的核苷酸序列簇，第二个描述的是PROSITE数据库中记录的蛋白质家族成员的蛋白质序列簇。集群数据库和CBLAST后处理器为识别和探索新序列和数据库条目之间的关系和依赖提供了一种有效的机制。

{"title":"Post-processing of BLAST results using databases of clustered sequences.","authors":"G S Miller, R Fuchs","doi":"10.1093/bioinformatics/13.1.81","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.81","url":null,"abstract":"Motivation: When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.Results: We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"81-7"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.81","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm. 采用遗传算法改进了脊髓灰质炎病毒5型非编码区的二级结构计算机预测。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.1

K M Currey, B A Shapiro

Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.

遗传算法获得的脊髓灰质炎病毒3型RNA 5′非编码区二级结构与Skinner等人(J. Mol. Biol.)模型的比较。， 207,379 -392, 1989)证明了许多已确认的结构要素。遗传算法(夏皮罗和纳韦塔，J.)(8,195 - 201,1994)生成所有可能的茎的种群，然后在大规模并行计算机上多次迭代混合，组合和重新组合这些茎，最终根据其能量选择最适合的结构。使用遗传算法可以更好地预测包含神经毒性决定因素的区域的二级结构，而动态规划算法(Zuker, Science, 244, 48-52, 1989)需要系统发育比较序列分析才能得出正确的结论。此外，在基因组的这一区域引入了人工突变，尽管可能会发生结构重排，但许多结构仍然存在，这表明这样选择的给定结构可能已经进化到能够承受孤立的突变。遗传算法衍生的5'非编码区结构与先前描述的生物数据和功能相比更有利，并且包含所有的“持久”结构，这也表明持久性因素可能有助于验证结构。

{"title":"Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm.","authors":"K M Currey, B A Shapiro","doi":"10.1093/bioinformatics/13.1.1","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.1","url":null,"abstract":"Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

ConsInspector 3.0: new library and enhanced functionality. ConsInspector 3.0:新的库和增强的功能。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.109

K Frech, P Dietze, T Werner

Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:

{"title":"ConsInspector 3.0: new library and enhanced functionality.","authors":"K Frech, P Dietze, T Werner","doi":"10.1093/bioinformatics/13.1.109","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.109","url":null,"abstract":"Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"109-10"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Reduced space sequence alignment. 减少空间序列对齐。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.45

J A Grice, R Hughey, D Speck

MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.

动机:序列对齐是在两个序列之间找到最佳的逐个字符对应的问题。在串行机器上，它可以在O(n2)时间和O(n2)空间中轻松解决，或者在并行机器上，在O(n)时间和O(n)空间中求解每O(n)个处理元素。Hirschberg寻找单个最佳路径的分而治之方法减少了n倍的空间使用，同时只对串行版本产生很小的恒定减速。结果:本文提出了一个家族的方法来计算序列对齐与减少内存，非常适合串行或并行实现。与分治方法不同，它们可以用于线性隐马尔可夫模型的前向后(Baum-Welch)训练，并且它们避免了依赖数据的重新划分，使它们更容易并行化。该算法的特点是，对于任意整数L，一个与L速度成正比的因子，以换取空间需求从O(n2)减少到O(n1根号n)。该算法族的单个最佳路径成员匹配分治算法的二次时间和线性空间。在实验中，该家族的O(n1.5)空间成员比O(n)空间分治算法快15-40%。

{"title":"Reduced space sequence alignment.","authors":"J A Grice, R Hughey, D Speck","doi":"10.1093/bioinformatics/13.1.45","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.45","url":null,"abstract":"MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"45-53"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.45","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

DROSOPOSON: a knowledge base on chromosomal localization of transposable element insertions in Drosophila. DROSOPOSON:果蝇转座因子插入染色体定位的知识库。

Computer applications in the biosciences : CABIOS

Pub Date : 1997-02-01 DOI: 10.1093/bioinformatics/13.1.61

C Hoogland, C Biémont

Motivation: What forces maintain transposable elements (TEs) in genomes and populations is one of the main questions to understand the dynamics of these elements, but the exact nature of these forces is still a matter of speculation. To test theoretical models of TE population dynamics, we need many data on the genomic distributions of various elements. These data are now accumulating for the species Drosophila melanogaster, but they are scattered in the literature.

Results: The knowledge base DROSOPOSON thus brings together: (1) data available on Drosophila chromosomal localizations of TE insertions and on features of the polytene chromosomes (DNA content, recombination rate, break-points, etc); (2) statistical methods aimed at analysing the distribution of the TE insertions along the chromosomes. In this paper, we present the structure of the base, the data and the statistical methods. Theoretical models of containment of TE copy number in Drosophila can thus be tested.

动机:在基因组和种群中维持转座因子(te)的力量是理解这些元素动态的主要问题之一，但这些力量的确切性质仍然是一个猜测问题。为了验证TE种群动态的理论模型，我们需要大量关于不同元素基因组分布的数据。这些关于黑腹果蝇的数据正在积累，但它们在文献中是分散的。结果:DROSOPOSON知识库汇集了:(1)果蝇TE插入染色体定位和多线染色体特征(DNA含量、重组率、断点等)的可用数据;(2)旨在分析TE插入沿染色体分布的统计方法。本文介绍了该数据库的结构、数据和统计方法。因此，果蝇TE拷贝数控制的理论模型可以得到检验。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer applications in the biosciences : CABIOS

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀