Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889589
H. Jamil
Validating hypotheses and reasoning about objects is becoming commonplace in biotechnology research. The capability to reason strengthens comparative genomics research by providing a much-needed tool to pose intelligent queries in a more convenient and declarative fashion. To be able to reason using the Genomic Query Language (GQL), we propose the idea of parameterized views as an extension of SQL's "create view" construct with an optional "with parameter" clause. Parameterizing enables traditional SQL views to accept input values and to delay the computation of the view until invoked with a "call" statement. This extension empowers users with the capability of modifying the behavior of predefined procedures (views) by sending arguments and evaluating the procedure on demand. We demonstrate that the extension is soundly based, with a parallel in Datalog. We also show that the idea of relational unification proposed in this paper empowers SQL to reason and infer in exactly the same way as an object-oriented Datalog. Thereby, it eliminates the need for cumbersome integration of database engines with deductive reasoners, as was customary in many advanced genomic database applications in the past.
{"title":"GQL: a reasonable complex SQL for genomic databases","authors":"H. Jamil","doi":"10.1109/BIBE.2000.889589","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889589","url":null,"abstract":"Validating hypotheses and reasoning about objects is becoming commonplace in biotechnology research. The capability to reason strengthens comparative genomics research by providing a much-needed tool to pose intelligent queries in a more convenient and declarative fashion. To be able to reason using the Genomic Query Language (GQL), we propose the idea of parameterized views as an extension of SQL's \"create view\" construct with an optional \"with parameter\" clause. Parameterizing enables traditional SQL views to accept input values and to delay the computation of the view until invoked with a \"call\" statement. This extension empowers users with the capability of modifying the behavior of predefined procedures (views) by sending arguments and evaluating the procedure on demand. We demonstrate that the extension is soundly based, with a parallel in Datalog. We also show that the idea of relational unification proposed in this paper empowers SQL to reason and infer in exactly the same way as an object-oriented Datalog. Thereby, it eliminates the need for cumbersome integration of database engines with deductive reasoners, as was customary in many advanced genomic database applications in the past.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126895510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889605
Tao Jiang, Peng Zhao
Blocked multiple-sequence alignment (BMA) refers to the construction of multiple alignments in DNA by first aligning conserved regions into what we call "blocks" and then aligning the regions between successive blocks to form a final alignment. Instead of starting from low-order pairwise alignments, we propose a new way to form blocks by searching for closely related regions in all input sequences, allowing internal spaces in blocks as well as some degree of mismatch. We address the problem of semi-conserved patterns (patterns that do not appear in all input sequences) by introducing into the process two similarity thresholds that are adjusted dynamically according to the input. A method to control the number of blocks is also presented to deal with the situation when input sequences have so many similar regions that it becomes impractical to form blocks by trying every combination. BMA is an implementation of this approach, and our experimental results indicate that this approach is efficient, particularly on large numbers of long sequences with well-conserved regions.
{"title":"A heuristic algorithm for blocked multiple sequence alignment","authors":"Tao Jiang, Peng Zhao","doi":"10.1109/BIBE.2000.889605","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889605","url":null,"abstract":"Blocked multiple-sequence alignment (BMA) refers to the construction of multiple alignments in DNA by first aligning conserved regions into what we call \"blocks\" and then aligning the regions between successive blocks to form a final alignment. Instead of starting from low-order pairwise alignments, we propose a new way to form blocks by searching for closely related regions in all input sequences, allowing internal spaces in blocks as well as some degree of mismatch. We address the problem of semi-conserved patterns (patterns that do not appear in all input sequences) by introducing into the process two similarity thresholds that are adjusted dynamically according to the input. A method to control the number of blocks is also presented to deal with the situation when input sequences have so many similar regions that it becomes impractical to form blocks by trying every combination. BMA is an implementation of this approach, and our experimental results indicate that this approach is efficient, particularly on large numbers of long sequences with well-conserved regions.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123740269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889601
J. Aracena, S. Lamine, M. Mermet, O. Cohen, Jacques Demongeot
The human genome has evolved from a primitive genome to its present state dispatched along the 23 pairs of chromosomes. This evolution has been ruled by the mutation process and also by the physiological and pathological reorganization of the genomic material inside or between the chromosomes, which condition the genomic variability. This reorganization starts at singular points on the short or long chromosomic arms, called crossover, translocation, insertion or break-points. In this paper, we show that these points, also called "weak points" or "hot spots" of the genome, are correlated independently of their origin. In addition, we give some properties of the interaction matrices in terms of attractors (generalizing some earlier results to the discrete case).
{"title":"Mathematical modelling in genetic networks: relationships between the genetic expression and both chromosomic breakage and positive circuits","authors":"J. Aracena, S. Lamine, M. Mermet, O. Cohen, Jacques Demongeot","doi":"10.1109/BIBE.2000.889601","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889601","url":null,"abstract":"The human genome has evolved from a primitive genome to its present state dispatched along the 23 pairs of chromosomes. This evolution has been ruled by the mutation process and also by the physiological and pathological reorganization of the genomic material inside or between the chromosomes, which condition the genomic variability. This reorganization starts at singular points on the short or long chromosomic arms, called crossover, translocation, insertion or break-points. In this paper, we show that these points, also called \"weak points\" or \"hot spots\" of the genome, are correlated independently of their origin. In addition, we give some properties of the interaction matrices in terms of attractors (generalizing some earlier results to the discrete case).","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131154154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889608
Fugen Li, G. Stormo
High-density DNA oligonucleotide microarrays are widely used in biomedical research. In this paper, we describe algorithms to optimize the selection of specific probes for each gene in an entire genome. Having optimized probes for each gene is valuable for two reasons: (1) by minimizing background hybridization, they provide more accurate determinations of true expression levels, and (2) having optimum probes eliminates the need for multiple probes per gene, as is usually done now, thereby decreasing the cost of each microarray and increasing their usage. The criteria for truly optimum probes is easily stated, but they are not computable at present. We have developed a heuristic approach that is efficiently computable and should provide a good approximation to the true optimum set. We have run the program on the complete genomes for several model organisms and deposited the results in a database that is available online ().
{"title":"Selecting optimum DNA oligos for microarrays","authors":"Fugen Li, G. Stormo","doi":"10.1109/BIBE.2000.889608","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889608","url":null,"abstract":"High-density DNA oligonucleotide microarrays are widely used in biomedical research. In this paper, we describe algorithms to optimize the selection of specific probes for each gene in an entire genome. Having optimized probes for each gene is valuable for two reasons: (1) by minimizing background hybridization, they provide more accurate determinations of true expression levels, and (2) having optimum probes eliminates the need for multiple probes per gene, as is usually done now, thereby decreasing the cost of each microarray and increasing their usage. The criteria for truly optimum probes is easily stated, but they are not computable at present. We have developed a heuristic approach that is efficiently computable and should provide a good approximation to the true optimum set. We have run the program on the complete genomes for several model organisms and deposited the results in a database that is available online ().","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133303718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889609
F. Azuaje
Despite more than 30 years of experimental research, there have been no generic models to classify tumours and identify new types of cancer. Similarly, advances in the molecular classification of tumours may play a central role in cancer treatment. In this paper, a new approach to genome expression pattern interpretation is described and applied to the recognition of B-cell malignancies as a test set. Using DNA microarray data generated by A.A. Alizadeh et al. (2000), a neural network model known as a simplified fuzzy ARTMAP is able to identify normal and diffuse large B-cell lymphoma (DLBCL) patients. Furthermore, it discovers the distinction between patients with molecularly distinct forms of DLBCL without previous knowledge of those subtypes.
{"title":"Making genome expression data meaningful: prediction and discovery of classes of cancer through a connectionist learning approach","authors":"F. Azuaje","doi":"10.1109/BIBE.2000.889609","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889609","url":null,"abstract":"Despite more than 30 years of experimental research, there have been no generic models to classify tumours and identify new types of cancer. Similarly, advances in the molecular classification of tumours may play a central role in cancer treatment. In this paper, a new approach to genome expression pattern interpretation is described and applied to the recognition of B-cell malignancies as a test set. Using DNA microarray data generated by A.A. Alizadeh et al. (2000), a neural network model known as a simplified fuzzy ARTMAP is able to identify normal and diffuse large B-cell lymphoma (DLBCL) patients. Furthermore, it discovers the distinction between patients with molecularly distinct forms of DLBCL without previous knowledge of those subtypes.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133804759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889626
L. Sui, R. Haralick
Automated left ventricle (LV) boundary delineation from left ventriculograms has been studied for decades. Unfortunately, no methods in terms of the accuracy about volume and ejection fraction have ever been reported. A new knowledge based multi-stage method to automatically delineate the LV boundary at end diastole and end systole is discussed in this paper: It has a mean absolute boundary error of about 2 mm and an associated ejection fraction error of about 6%. The method makes extensive use of knowledge about LV shape and movement. The processing includes a multi-image pixel region classification, a shape regression and a rejection classification. The method was trained and tested on a database of 375 studies whose ED and ES boundary have been manually traced as the ground truth. The cross-validated results presented in this paper shows that the accuracy is close to and slightly above inter-observer variability.
{"title":"Automated left ventricle boundary delineation","authors":"L. Sui, R. Haralick","doi":"10.1109/BIBE.2000.889626","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889626","url":null,"abstract":"Automated left ventricle (LV) boundary delineation from left ventriculograms has been studied for decades. Unfortunately, no methods in terms of the accuracy about volume and ejection fraction have ever been reported. A new knowledge based multi-stage method to automatically delineate the LV boundary at end diastole and end systole is discussed in this paper: It has a mean absolute boundary error of about 2 mm and an associated ejection fraction error of about 6%. The method makes extensive use of knowledge about LV shape and movement. The processing includes a multi-image pixel region classification, a shape regression and a rejection classification. The method was trained and tested on a database of 375 studies whose ED and ES boundary have been manually traced as the ground truth. The cross-validated results presented in this paper shows that the accuracy is close to and slightly above inter-observer variability.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121742183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-11-08DOI: 10.1109/BIBE.2000.889618
Leticia V. Guimaraes, A. Suzim, J. Maeda
Presents a new automatic circular decomposition algorithm, which proceeds the separation of connected circular particles, in order to locate their center coordinates and estimate their radii. This new algorithm is based on the following supposition "if you are looking for circles you must assume that all objects in an image are circles, until you can prove they are not". In this work the authors compare this heuristic algorithm with the polygonal approximation based algorithm proposed by Kubo (1988) in their methods and results. Both of them are able to decompose connected blood cells with some differences: the proposed algorithm is implemented in 2 steps, while the polygonal approximation based method is divided in 4 steps and its input parameters present high sensitivity to noise.
{"title":"A new automatic circular decomposition algorithm applied to blood cells image","authors":"Leticia V. Guimaraes, A. Suzim, J. Maeda","doi":"10.1109/BIBE.2000.889618","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889618","url":null,"abstract":"Presents a new automatic circular decomposition algorithm, which proceeds the separation of connected circular particles, in order to locate their center coordinates and estimate their radii. This new algorithm is based on the following supposition \"if you are looking for circles you must assume that all objects in an image are circles, until you can prove they are not\". In this work the authors compare this heuristic algorithm with the polygonal approximation based algorithm proposed by Kubo (1988) in their methods and results. Both of them are able to decompose connected blood cells with some differences: the proposed algorithm is implemented in 2 steps, while the polygonal approximation based method is divided in 4 steps and its input parameters present high sensitivity to noise.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-05-01DOI: 10.1109/BIBE.2000.889607
A. Skourikhine, T. Burr
Applies a linguistic analysis method (N-grams) to classify nucleotide and amino acid sequences of the nucleoprotein (NP) gene of the influenza A virus isolated from three hosts and several geographic regions. We considered letter frequency (1-grams), letter-pairs' frequency (2-grams) and triplets' frequency (3-grams). Nearest-neighbor classifiers and decision-tree classifiers based on 1-, 2- and 3-grams were constructed for NP nucleotide and amino acid strains, and their classification efficiencies were compared with the groupings obtained using phylogenetic analysis. Our results show that disregarding positional information for NP can provide almost the same high level of classification accuracy as alternative, more complex classification techniques that use positional information.
{"title":"Linguistic analysis of the nucleoprotein gene of influenza A virus","authors":"A. Skourikhine, T. Burr","doi":"10.1109/BIBE.2000.889607","DOIUrl":"https://doi.org/10.1109/BIBE.2000.889607","url":null,"abstract":"Applies a linguistic analysis method (N-grams) to classify nucleotide and amino acid sequences of the nucleoprotein (NP) gene of the influenza A virus isolated from three hosts and several geographic regions. We considered letter frequency (1-grams), letter-pairs' frequency (2-grams) and triplets' frequency (3-grams). Nearest-neighbor classifiers and decision-tree classifiers based on 1-, 2- and 3-grams were constructed for NP nucleotide and amino acid strains, and their classification efficiencies were compared with the groupings obtained using phylogenetic analysis. Our results show that disregarding positional information for NP can provide almost the same high level of classification accuracy as alternative, more complex classification techniques that use positional information.","PeriodicalId":196846,"journal":{"name":"Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121571916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}