Pub Date : 1997-04-01DOI: 10.1093/bioinformatics/13.2.145
A Wozniak
Motivation: This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.
Results: Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.
{"title":"Using video-oriented instructions to speed up sequence comparison.","authors":"A Wozniak","doi":"10.1093/bioinformatics/13.2.145","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.2.145","url":null,"abstract":"<p><strong>Motivation: </strong>This document presents an implementation of the well-known Smith-Waterman algorithm for comparison of proteic and nucleic sequences, using specialized video instructions. These instructions, SIMD-like in their design, make possible parallelization of the algorithm at the instruction level.</p><p><strong>Results: </strong>Benchmarks on an ULTRA SPARC running at 167 MHz show a speed-up factor of two compared to the same algorithm implemented with integer instructions on the same machine. Performance reaches over 18 million matrix cells per second on a single processor, giving to our knowledge the fastest implementation of the Smith-Waterman algorithm on a workstation. The accelerated procedure was introduced in LASSAP--a LArge Scale Sequence compArison Package software developed at INRIA--which handles parallelism at higher level. On a SUN Enterprise 6000 server with 12 processors, a speed of nearly 200 million matrix cells per second has been obtained. A sequence of length 300 amino acids is scanned against SWISSPROT R33 (1,8531,385 residues) in 29 s. This procedure is not restricted to databank scanning. It applies to all cases handled by LASSAP (intra- and inter-bank comparisons, Z-score computation, etc.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 2","pages":"145-50"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.2.145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20094540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.111
A S Ivanov, A B Rumjantsev, V S Skvortşov, A I Archakov
The examination of protein three-dimensional (3D) structure and ligand binding site are the key points in structure-based computer-aided drug design (Kuntz, 1992). The ligand binding site usually consists of some fragments of protein sequence. Its examination requires active participation both of human and computer in the process of investigation. This work is a result of our need for a Windows-based program for such investigations. ONIX is an interactive piece of software based on protein structure hierarchy. Analysis of the molecular surface makes it possible to find all elements of the ligand binding site. ONIX v.1.03 is free software and will be made available by both FTP and E-mail from the EMBL file servers.
{"title":"ONIX: an interactive PC program for the examination of protein 3D structure from PDB.","authors":"A S Ivanov, A B Rumjantsev, V S Skvortşov, A I Archakov","doi":"10.1093/bioinformatics/13.1.111","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.111","url":null,"abstract":"The examination of protein three-dimensional (3D) structure and ligand binding site are the key points in structure-based computer-aided drug design (Kuntz, 1992). The ligand binding site usually consists of some fragments of protein sequence. Its examination requires active participation both of human and computer in the process of investigation. This work is a result of our need for a Windows-based program for such investigations. ONIX is an interactive piece of software based on protein structure hierarchy. Analysis of the molecular surface makes it possible to find all elements of the ligand binding site. ONIX v.1.03 is free software and will be made available by both FTP and E-mail from the EMBL file servers.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"111-3"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.37
E V Korotkov, M A Korotkova, J S Tulko
A method of latent periodicity search is developed. We use mutual information to reveal the latent periodicity of mRNA sequences. The latent periodicity of an mRNA sequence is a periodicity with a low level of similarity between any two periods inside the mRNA sequence. The mutual information between an artificial numerical sequence and an mRNA sequence is calculated. The length of the artificial sequence period is varied from 2 to 150. The high level of the mutual information between artificial and mRNA sequences allows us to find any type of latent periodicity of mRNA sequence. The latent periodicity of many mRNA coding regions has been found. For example, the retinoblastoma gene of HSRBS clone contains a region with a latent period equal to 45 bases. The A-RAF oncogene of HSARAFIR clone contains a region with a latent period equal to 84 bases. Integrated sequences for the regions with latent periodicity are determined. The potential significance of latent periodicity is discussed.
{"title":"Latent sequence periodicity of some oncogenes and DNA-binding protein genes.","authors":"E V Korotkov, M A Korotkova, J S Tulko","doi":"10.1093/bioinformatics/13.1.37","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.37","url":null,"abstract":"<p><p>A method of latent periodicity search is developed. We use mutual information to reveal the latent periodicity of mRNA sequences. The latent periodicity of an mRNA sequence is a periodicity with a low level of similarity between any two periods inside the mRNA sequence. The mutual information between an artificial numerical sequence and an mRNA sequence is calculated. The length of the artificial sequence period is varied from 2 to 150. The high level of the mutual information between artificial and mRNA sequences allows us to find any type of latent periodicity of mRNA sequence. The latent periodicity of many mRNA coding regions has been found. For example, the retinoblastoma gene of HSRBS clone contains a region with a latent period equal to 45 bases. The A-RAF oncogene of HSARAFIR clone contains a region with a latent period equal to 84 bases. Integrated sequences for the regions with latent periodicity are determined. The potential significance of latent periodicity is discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.37","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.107
W Bains
{"title":"Hexanucleotide frequency database.","authors":"W Bains","doi":"10.1093/bioinformatics/13.1.107","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.107","url":null,"abstract":"","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"107-8"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.75
K M Chao, J Zhang, J Ostell, W Miller
Results: We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.
{"title":"A tool for aligning very similar DNA sequences.","authors":"K M Chao, J Zhang, J Ostell, W Miller","doi":"10.1093/bioinformatics/13.1.75","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.75","url":null,"abstract":"<p><strong>Results: </strong>We have produced a computer program, named sim3, that solves the following computational problem. Two DNA sequences are given, where the shorter sequence is very similar to some contiguous region of the longer sequence. Sim3 determines such a similar region of the longer sequence, and then computes an optimal set of single-nucleotide changes (i.e. insertions, deletions or substitutions) that will convert the shorter sequence to that region. Thus, the alignment scoring scheme is designed to model sequencing errors, rather than evolutionary processes. The program can align a 100 kb sequence to a 1 megabase sequence in a few seconds on a workstation, provided that there are very few differences between the shorter sequence and some region in the longer sequence. The program has been used to assemble sequence data for the Genomes Division at the National Center for Biotechnology Information.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"75-80"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.75","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.81
G S Miller, R Fuchs
Motivation: When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.
Results: We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.
{"title":"Post-processing of BLAST results using databases of clustered sequences.","authors":"G S Miller, R Fuchs","doi":"10.1093/bioinformatics/13.1.81","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.81","url":null,"abstract":"<p><strong>Motivation: </strong>When evaluating the results of a sequence similarity search, there are many situations where it can be useful to determine whether sequences appearing in the results share some distinguishing characteristic. Such dependencies between database entries are often not readily identifiable, but can yield important new insights into the biological function of a gene or protein.</p><p><strong>Results: </strong>We have developed a program called CBLAST that sorts the results of a BLAST sequence similarity search according to sequence membership in user-defined 'clusters' of sequences. To demonstrate the utility of this application, we have constructed two cluster databases. The first describes clusters of nucleotide sequences representing the same gene, as documented in the UNIGENE database, and the second describes clusters of protein sequences which are members of the protein families documented in the PROSITE database. Cluster databases and the CBLAST post-processor provide an efficient mechanism for identifying and exploring relationships and dependencies between new sequences and database entries.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"81-7"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.81","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.1
K M Currey, B A Shapiro
Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.
{"title":"Secondary structure computer prediction of the poliovirus 5' non-coding region is improved by a genetic algorithm.","authors":"K M Currey, B A Shapiro","doi":"10.1093/bioinformatics/13.1.1","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.1","url":null,"abstract":"<p><p>Comparison of the secondary structure of the 5' non-coding region of poliovirus 3 RNA derived from the genetic algorithm with the model of Skinner et al. (J. Mol. Biol., 207, 379-392, 1989) demonstrates many of the confirmed structural elements. The genetic algorithm (Shapiro and Navetta, J. Supercomput., 8, 195-201, 1994) generates a population of all possible stems, then mixes, combines, and recombines these stems in multiple iterations on a massively parallel computer, ultimately selecting a most fit structure based on its energy. The secondary structure of the region containing the determinants of neurovirulence was better predicted using the genetic algorithm, whereas the dynamic programming algorithm (Zuker, Science, 244, 48-52, 1989) required phylogenetic comparative sequence analysis to arrive at the correct conclusion. In addition, artificial mutations were introduced throughout this region of the genome and although rearrangements in structure may occur, many structures persisted, suggesting that the given structures thus selected may have evolved to withstand isolated mutations. The genetic algorithm-derived structure for the 5' non-coding region compares favorably with the biological data and functions previously described, and contains all of the 'persistent' structures, suggesting also that the persistence factor may be an aid to validating structures.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.109
K Frech, P Dietze, T Werner
Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:
{"title":"ConsInspector 3.0: new library and enhanced functionality.","authors":"K Frech, P Dietze, T Werner","doi":"10.1093/bioinformatics/13.1.109","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.109","url":null,"abstract":"Conslnspector (Freeh et al, 1993) is a program to scan nucleic acid sequences for matches to a pre-compiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates; the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included in the analysis (default 40 bp upstream and 40 bp downstream of the highly conserved core sequence). This feature distinguishes the program from other methods available for the identification of transcription factor binding sites which are restricted to the binding sites: SIGNAL SCAN (Prestridge, 1991, 1996; Prestridge and Stormo, 1993), MATRIX SEARCH (Chen et al, 1995) and Matlnspector (Quandt et al, 1995a). Recently, we showed the quality scores (Q-scores) assigned by Conslnspector to correlate to some extent with biological functionality (Quandt et al, 1995b). Release 3.0 of Conslnspector, with enhanced performance and a considerably extended library of consensus profiles, is available now at ftp://ariane.gsf.de/pub/ or http://www.gsf.de/biodv/. The program Conslnd (Freeh et al, 1993) has been used to compile the library of consensus profiles. The library now encompasses 37 consensus profiles (Release 1.0: 12, Release 2.1: 17 consensus profiles) and is separated into four groups (Table I). The extended weight matrices were deduced from experimentally confirmed binding sequences selected from the TRANSFAC database (Wingender et al, 1996) or directly from the literature. Most consensus profiles of the original library have been improved by the inclusion of additional sequences. Consensus profiles have been compiled from a minimum of nine sequences (Table I). The analysis of DNA sequences for transcription factor binding sites with Conslnspector has improved since Release 1.0:","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"109-10"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.109","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.45
J A Grice, R Hughey, D Speck
MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.
{"title":"Reduced space sequence alignment.","authors":"J A Grice, R Hughey, D Speck","doi":"10.1093/bioinformatics/13.1.45","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.45","url":null,"abstract":"MOTIVATION Sequence alignment is the problem of finding the optimal character-by-character correspondence between two sequences. It can be readily solved in O(n2) time and O(n2) space on a serial machine, or in O(n) time with O(n) space per O(n) processing elements on a parallel machine. Hirschberg's divide-and-conquer approach for finding the single best path reduces space use by a factor of n while inducing only a small constant slowdown to the serial version. RESULTS This paper presents a family of methods for computing sequence alignments with reduced memory that are well suited to serial or parallel implementation. Unlike the divide-and-conquer approach, they can be used in the forward-backward (Baum-Welch) training of linear hidden Markov models, and they avoid data-dependent repartitioning, making them easier to parallelize. The algorithms feature, for an arbitrary integer L, a factor proportional to L slowdown in exchange for reducing space requirement from O(n2) to O(n1 square root of n). A single best path member of this algorithm family matches the quadratic time and linear space of the divide-and-conquer algorithm. Experimentally, the O(n1.5)-space member of the family is 15-40% faster than the O(n)-space divide-and-conquer algorithm.","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"45-53"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.45","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-02-01DOI: 10.1093/bioinformatics/13.1.61
C Hoogland, C Biémont
Motivation: What forces maintain transposable elements (TEs) in genomes and populations is one of the main questions to understand the dynamics of these elements, but the exact nature of these forces is still a matter of speculation. To test theoretical models of TE population dynamics, we need many data on the genomic distributions of various elements. These data are now accumulating for the species Drosophila melanogaster, but they are scattered in the literature.
Results: The knowledge base DROSOPOSON thus brings together: (1) data available on Drosophila chromosomal localizations of TE insertions and on features of the polytene chromosomes (DNA content, recombination rate, break-points, etc); (2) statistical methods aimed at analysing the distribution of the TE insertions along the chromosomes. In this paper, we present the structure of the base, the data and the statistical methods. Theoretical models of containment of TE copy number in Drosophila can thus be tested.
{"title":"DROSOPOSON: a knowledge base on chromosomal localization of transposable element insertions in Drosophila.","authors":"C Hoogland, C Biémont","doi":"10.1093/bioinformatics/13.1.61","DOIUrl":"https://doi.org/10.1093/bioinformatics/13.1.61","url":null,"abstract":"<p><strong>Motivation: </strong>What forces maintain transposable elements (TEs) in genomes and populations is one of the main questions to understand the dynamics of these elements, but the exact nature of these forces is still a matter of speculation. To test theoretical models of TE population dynamics, we need many data on the genomic distributions of various elements. These data are now accumulating for the species Drosophila melanogaster, but they are scattered in the literature.</p><p><strong>Results: </strong>The knowledge base DROSOPOSON thus brings together: (1) data available on Drosophila chromosomal localizations of TE insertions and on features of the polytene chromosomes (DNA content, recombination rate, break-points, etc); (2) statistical methods aimed at analysing the distribution of the TE insertions along the chromosomes. In this paper, we present the structure of the base, the data and the statistical methods. Theoretical models of containment of TE copy number in Drosophila can thus be tested.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"13 1","pages":"61-8"},"PeriodicalIF":0.0,"publicationDate":"1997-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.1.61","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"20040130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}