Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00024-4
Andrzej K Konopka
{"title":"Grand metaphors of biology in the genome era","authors":"Andrzej K Konopka","doi":"10.1016/S0097-8485(02)00024-4","DOIUrl":"10.1016/S0097-8485(02)00024-4","url":null,"abstract":"","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 397-401"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00024-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90275132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00027-X
Jean-Loup Risler
{"title":"Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins","authors":"Jean-Loup Risler","doi":"10.1016/S0097-8485(02)00027-X","DOIUrl":"10.1016/S0097-8485(02)00027-X","url":null,"abstract":"","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 549-551"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00027-X","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84741377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00025-6
Andrzej K Konopka
{"title":"This Is Biology: The Science of the Living World","authors":"Andrzej K Konopka","doi":"10.1016/S0097-8485(02)00025-6","DOIUrl":"10.1016/S0097-8485(02)00025-6","url":null,"abstract":"","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 543-545"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00025-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83438036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00010-4
Wentian Li , Pedro Bernaola-Galván , Fatameh Haghighi , Ivo Grosse
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
{"title":"Applications of recursive segmentation to the analysis of DNA sequences","authors":"Wentian Li , Pedro Bernaola-Galván , Fatameh Haghighi , Ivo Grosse","doi":"10.1016/S0097-8485(02)00010-4","DOIUrl":"10.1016/S0097-8485(02)00010-4","url":null,"abstract":"<div><p>Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G+C)/weak(A+T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 491-510"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00010-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79853764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00011-6
A. Louis , H. Chiapello , C. Fabry , E. Ollivier , A. Hénaut
In the framework of genome annotation, scientific literature is obviously the major source of biological knowledge. The aim of the work described in this paper is to exploit this source of data for the model plant Arabidopsis thaliana. The first step has consisted in constituting a relevant bibliographic references dataset for plant genomic research. Genes co-citations have then been systematically annotated in this reference dataset, starting from the simple idea that if genes are cited in the same publication, they must probably share some related functional properties. In order to deal with the synonymous gene name problem; a gene name reference list has been constituted starting from A. thaliana SwissProt entries. This list was used to build clusters of co-cited genes by a single linkage procedure such that any gene in a given cluster possesses at least one co-cited partner in the same cluster. Analysis of the clusters demonstrate the biological consistency of this approach, with only very few fortuitous links. As an example, a cluster including genes related to flowering time is more deeply described in the paper. Finally, a graphical representation of each cluster was performed, which provides a convenient way to retrieve the genes (the nodes of the graphs) and the references in which they were co-cited (the edges of the graphs). All the results can be accessed at the URL http://chlora.Igi.infobiogen.fr:1234/bib_arath/.
{"title":"Deciphering Arabidopsis thaliana gene neighborhoods through bibliographic co-citations","authors":"A. Louis , H. Chiapello , C. Fabry , E. Ollivier , A. Hénaut","doi":"10.1016/S0097-8485(02)00011-6","DOIUrl":"10.1016/S0097-8485(02)00011-6","url":null,"abstract":"<div><p>In the framework of genome annotation, scientific literature is obviously the major source of biological knowledge. The aim of the work described in this paper is to exploit this source of data for the model plant <em>Arabidopsis thaliana</em>. The first step has consisted in constituting a relevant bibliographic references dataset for plant genomic research. Genes co-citations have then been systematically annotated in this reference dataset, starting from the simple idea that if genes are cited in the same publication, they must probably share some related functional properties. In order to deal with the synonymous gene name problem; a gene name reference list has been constituted starting from <em>A. thaliana</em> SwissProt entries. This list was used to build clusters of co-cited genes by a single linkage procedure such that any gene in a given cluster possesses at least one co-cited partner in the same cluster. Analysis of the clusters demonstrate the biological consistency of this approach, with only very few fortuitous links. As an example, a cluster including genes related to flowering time is more deeply described in the paper. Finally, a graphical representation of each cluster was performed, which provides a convenient way to retrieve the genes (the nodes of the graphs) and the references in which they were co-cited (the edges of the graphs). All the results can be accessed at the URL <span>http://chlora.Igi.infobiogen.fr:1234/bib_arath/</span><svg><path></path></svg>.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 511-519"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00011-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84143076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00008-6
Jaap Heringa
This paper describes three weighting schemes for improving the accuracy of progressive multiple sequence alignment methods: (1) global profile pre-processing, to capture for each sequence information about other sequences in a profile before the actual multiple alignment takes place; (2) local pre-processing; which incorporates a new protocol to only use non-overlapping local sequence regions to construct the pre-processed profiles; and (3) local–global alignment, a weighting scheme based on the double dynamic programming (DDP) technique to softly bias global alignment to local sequence motifs. The first two schemes allow the compilation of residue-specific multiple alignment reliability indices, which can be used in an iterative fashion. The schemes have been implemented with associated iterative modes in the PRALINE multiple sequence alignment method, and have been evaluated using the BAliBASE benchmark alignment database. These tests indicate that PRALINE is a toolbox able to build alignments with very high quality. We found that local profile pre-processing raises the alignment quality by 5.5% compared to PRALINE alignments generated under default conditions. Iteration enhances the quality by a further percentage point. The implications of multiple alignment scoring functions and iteration in relation to alignment quality and benchmarking are discussed.
{"title":"Local weighting schemes for protein multiple sequence alignment","authors":"Jaap Heringa","doi":"10.1016/S0097-8485(02)00008-6","DOIUrl":"10.1016/S0097-8485(02)00008-6","url":null,"abstract":"<div><p>This paper describes three weighting schemes for improving the accuracy of progressive multiple sequence alignment methods: (1) global profile pre-processing, to capture for each sequence information about other sequences in a profile before the actual multiple alignment takes place; (2) local pre-processing; which incorporates a new protocol to only use non-overlapping local sequence regions to construct the pre-processed profiles; and (3) local–global alignment, a weighting scheme based on the double dynamic programming (DDP) technique to softly bias global alignment to local sequence motifs. The first two schemes allow the compilation of residue-specific multiple alignment reliability indices, which can be used in an iterative fashion. The schemes have been implemented with associated iterative modes in the PRALINE multiple sequence alignment method, and have been evaluated using the BAliBASE benchmark alignment database. These tests indicate that PRALINE is a toolbox able to build alignments with very high quality. We found that local profile pre-processing raises the alignment quality by 5.5% compared to PRALINE alignments generated under default conditions. Iteration enhances the quality by a further percentage point. The implications of multiple alignment scoring functions and iteration in relation to alignment quality and benchmarking are discussed.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 459-477"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00008-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73167550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The proteins involved in a single biological process may form a stable supra-molecular assembly or be transiently in interaction. Although, the first annotation steps of a complete genome may allow the identification of the different partners, their assembly in a functional system, referred to as an integrated system, is a domain where methodological effort has to be done. Indeed, the knowledge required to assemble partners of such systems should be explicitly included in annotation software. The availability of a complete genome, and therefore of all the proteins encoded by that genome, motivated the development of automated approaches through the coordinated combination of different bio-informatic methods allowing the identification of the different partners, their assembly and the classification of the reconstructed systems in functional categories. In this data flux, the identification of the sequence partners represents the principal bottleneck. Here, we describe and compare the results obtained with different classes of methods (blastp2, psi-blast, mast and Meta-meme) applied to the identification in complete genomes of a given family of integrated systems: the ABC transporters. psi-blast appears to significantly outperform motif-based methods, and the results are discussed according to the nature of the proteins and the structure of the sub-families.
{"title":"Strategies for the identification, the assembly and the classification of integrated biological systems in completely sequenced genomes","authors":"Yves Quentin , Julie Chabalier , Gwennaele Fichant","doi":"10.1016/S0097-8485(02)00007-4","DOIUrl":"10.1016/S0097-8485(02)00007-4","url":null,"abstract":"<div><p>The proteins involved in a single biological process may form a stable supra-molecular assembly or be transiently in interaction. Although, the first annotation steps of a complete genome may allow the identification of the different partners, their assembly in a functional system, referred to as an integrated system, is a domain where methodological effort has to be done. Indeed, the knowledge required to assemble partners of such systems should be explicitly included in annotation software. The availability of a complete genome, and therefore of all the proteins encoded by that genome, motivated the development of automated approaches through the coordinated combination of different bio-informatic methods allowing the identification of the different partners, their assembly and the classification of the reconstructed systems in functional categories. In this data flux, the identification of the sequence partners represents the principal bottleneck. Here, we describe and compare the results obtained with different classes of methods (<span>blastp2</span>, <span>psi</span>-<span>blast</span>, <span>mast</span> and M<span>eta</span>-<span>meme</span>) applied to the identification in complete genomes of a given family of integrated systems: the ABC transporters. <span>psi</span>-<span>blast</span> appears to significantly outperform motif-based methods, and the results are discussed according to the nature of the proteins and the structure of the sub-families.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 447-457"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00007-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84279598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00003-7
J.-C. Aude , A. Louis
The Z-value (Comput. Chem. 23 (1999) 333) is an extension of the Z-score that is classically used to compare sets of biological sequences. The Z-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The Z-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the Z-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of Z-values, that we extend in this paper in order to optimize Z-values computation. The incremental algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of Z-values calculation; (ii) it reduces the time complexity (this algorithm has been named incremental because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from Saccharomyces cerevisiae and Escherichia coli.
{"title":"An incremental algorithm for Z-value computations","authors":"J.-C. Aude , A. Louis","doi":"10.1016/S0097-8485(02)00003-7","DOIUrl":"10.1016/S0097-8485(02)00003-7","url":null,"abstract":"<div><p>The <em>Z</em>-value (Comput. Chem. 23 (1999) 333) is an extension of the <em>Z</em>-score that is classically used to compare sets of biological sequences. The <em>Z</em>-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The <em>Z</em>-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the <em>Z</em>-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of <em>Z</em>-values, that we extend in this paper in order to optimize <em>Z</em>-values computation. The <em>incremental</em> algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of <em>Z</em>-values calculation; (ii) it reduces the time complexity (this algorithm has been named <em>incremental</em> because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from <em>Saccharomyces cerevisiae</em> and <em>Escherichia coli</em>.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 402-410"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00003-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76202142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-07-01DOI: 10.1016/S0097-8485(02)00012-8
Fariza Tahi , Manolo Gouy , Mireille Régnier
This paper presents an algorithm, DCFold, that automatically predicts the common secondary structure of a set of aligned homologous RNA sequences. It is based on the comparative approach. Helices are searched in one of the sequences, called the ‘target sequence’, and compared to the helices in the other sequences, called the ‘test sequences’. Our algorithm searches in the target sequence for palindromes that have a high probability to define helices that are conserved in the test sequences. This selection of significant palindromes is based on criteria that take into account their length and their mutation rate. A recursive search of helices, starting from these likely ones, is implemented using the ‘divide and conquer’ approach. Indeed, as pseudo-knots are not searched by DCFold, a selected palindrome (p, p′) makes possible to divide the initial sequence into two sequences, the internal one and the one resulting from the concatenation of the two external ones. New palindromes can be searched independently in these subsequences. This algorithm was run on ribosomal RNA sequences and recovered very efficiently their common secondary structures.
本文提出了一种自动预测一组同源RNA序列的共同二级结构的算法dcold。它是基于比较的方法。在其中一个序列(称为“目标序列”)中搜索螺旋,并与其他序列(称为“测试序列”)中的螺旋进行比较。我们的算法在目标序列中搜索具有高概率定义在测试序列中保守的螺旋的回文。这种重要回文的选择是基于考虑到它们的长度和突变率的标准。螺旋的递归搜索,从这些可能的螺旋开始,使用“分而治之”的方法实现。实际上,由于dcold不搜索伪结,因此选择回文(p, p ')可以将初始序列分为两个序列,一个是内部序列,另一个是由两个外部序列串联而成的序列。新的回文可以在这些子序列中独立搜索。该算法在核糖体RNA序列上运行,并非常有效地恢复了它们的共同二级结构。
{"title":"Automatic RNA secondary structure prediction with a comparative approach","authors":"Fariza Tahi , Manolo Gouy , Mireille Régnier","doi":"10.1016/S0097-8485(02)00012-8","DOIUrl":"10.1016/S0097-8485(02)00012-8","url":null,"abstract":"<div><p>This paper presents an algorithm, DCFold, that automatically predicts the common secondary structure of a set of aligned homologous RNA sequences. It is based on the comparative approach. Helices are searched in one of the sequences, called the ‘target sequence’, and compared to the helices in the other sequences, called the ‘test sequences’. Our algorithm searches in the target sequence for palindromes that have a high probability to define helices that are conserved in the test sequences. This selection of significant palindromes is based on criteria that take into account their length and their mutation rate. A recursive search of helices, starting from these likely ones, is implemented using the ‘divide and conquer’ approach. Indeed, as pseudo-knots are not searched by DCFold, a selected palindrome (<em>p</em>,<!--> <em>p</em>′) makes possible to divide the initial sequence into two sequences, the internal one and the one resulting from the concatenation of the two external ones. New palindromes can be searched independently in these subsequences. This algorithm was run on ribosomal RNA sequences and recovered very efficiently their common secondary structures.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 521-530"},"PeriodicalIF":0.0,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00012-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91073109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}