In this paper we present the main steps to fold amino acid interaction networks. This is a graph whose vertices are the proteins amino acids and whose edges are the interactions between them. We begin by summarize relative works about this type of graphs to describe their topological properties. Then, we propose a genetic algorithm which reconstructs the secondary structure motifs. We continue our folding process with an ant colony approach. We guide the ant system to the tertiary structure relying on a probability that two amino acids interact as a function of their physico-chemical properties.
{"title":"How to Fold Amino Acid Interaction Networks by Computational Intelligence Methods","authors":"O. Gaci","doi":"10.1109/BIBE.2010.33","DOIUrl":"https://doi.org/10.1109/BIBE.2010.33","url":null,"abstract":"In this paper we present the main steps to fold amino acid interaction networks. This is a graph whose vertices are the proteins amino acids and whose edges are the interactions between them. We begin by summarize relative works about this type of graphs to describe their topological properties. Then, we propose a genetic algorithm which reconstructs the secondary structure motifs. We continue our folding process with an ant colony approach. We guide the ant system to the tertiary structure relying on a probability that two amino acids interact as a function of their physico-chemical properties.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129691756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequencing by hybridization is a promising cost-effective technology for high-throughput DNA sequencing via microarray chips. However, due to the effects of spectrum errors rooted from experimental conditions, a fast and accurate reconstruction of original sequences has become a challenging problem. In the last decade, a variety of analyses and designs have been tried to overcome this problem, where different strategies have different tradeoffs in speed and accuracy. Motivated by the idea that the errors could be identified by analyzing the interrelation of spectrum elements, this paper presents a new constructive heuristic algorithm, featuring an accurate reconstruction guided by a set of well-defined criteria and rules. The experiments on benchmark instance sets demonstrate that the proposed method can reconstruct long DNA sequences more accurately than current approaches in the literature.
{"title":"eSBH: An Accurate Constructive Heuristic Algorithm for DNA Sequencing by Hybridization","authors":"Yang Chen, Jinglu Hu","doi":"10.1109/BIBE.2010.29","DOIUrl":"https://doi.org/10.1109/BIBE.2010.29","url":null,"abstract":"Sequencing by hybridization is a promising cost-effective technology for high-throughput DNA sequencing via microarray chips. However, due to the effects of spectrum errors rooted from experimental conditions, a fast and accurate reconstruction of original sequences has become a challenging problem. In the last decade, a variety of analyses and designs have been tried to overcome this problem, where different strategies have different tradeoffs in speed and accuracy. Motivated by the idea that the errors could be identified by analyzing the interrelation of spectrum elements, this paper presents a new constructive heuristic algorithm, featuring an accurate reconstruction guided by a set of well-defined criteria and rules. The experiments on benchmark instance sets demonstrate that the proposed method can reconstruct long DNA sequences more accurately than current approaches in the literature.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132258370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Shi, Marko Srdanovic, T. Beuming, L. Skrabanek, J. Javitch, H. Weinstein
To enable rigorous and reliable modeling of functional mechanisms of membrane proteins, the computational modeling and simulation approaches must take advantage, to the fullest extent possible, of the wealth of data obtained experimentally. This type of information makes it possible not only to construct rigorous molecular models and quantitative representations of function, but also to test, verify and refine them continuously. Among the proteins in the cell membrane, the family of Neurotransmitter: Sodium Symporter (NSS) exhibits highly complex mechanisms that involve multiple molecular states in the substrate and ion transport cycle. Due to their high biological and medical importance, experimental data for these systems is abundant, and efficient management of information from the literature and other relevant data sources promises to be very rewarding. We have developed an information management system (IMS) for data concerning the NSS proteins to enable retrieval, structured organization, and query of information available in the literature. The IMS supports the interplay between computational approaches and experimental studies of structure, function and physiological mechanisms of the NSS, and is of special utility in the integration of data of different types and various sources, and of the mechanistic understanding emerging from separate functional studies of individual members of the family.
{"title":"TRAC: A Platform for Structure-Function Studies of NSS-Proteins Integrates Information from Bioinformatics and Biomedical Literature","authors":"Lei Shi, Marko Srdanovic, T. Beuming, L. Skrabanek, J. Javitch, H. Weinstein","doi":"10.1109/BIBE.2010.51","DOIUrl":"https://doi.org/10.1109/BIBE.2010.51","url":null,"abstract":"To enable rigorous and reliable modeling of functional mechanisms of membrane proteins, the computational modeling and simulation approaches must take advantage, to the fullest extent possible, of the wealth of data obtained experimentally. This type of information makes it possible not only to construct rigorous molecular models and quantitative representations of function, but also to test, verify and refine them continuously. Among the proteins in the cell membrane, the family of Neurotransmitter: Sodium Symporter (NSS) exhibits highly complex mechanisms that involve multiple molecular states in the substrate and ion transport cycle. Due to their high biological and medical importance, experimental data for these systems is abundant, and efficient management of information from the literature and other relevant data sources promises to be very rewarding. We have developed an information management system (IMS) for data concerning the NSS proteins to enable retrieval, structured organization, and query of information available in the literature. The IMS supports the interplay between computational approaches and experimental studies of structure, function and physiological mechanisms of the NSS, and is of special utility in the integration of data of different types and various sources, and of the mechanistic understanding emerging from separate functional studies of individual members of the family.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"28 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134416539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-05-31DOI: 10.1142/S0218213012400234
Fotis Psomopoulos, P. Mitkas
The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.
{"title":"Multi Level Clustering of Phylogenetic Profiles","authors":"Fotis Psomopoulos, P. Mitkas","doi":"10.1142/S0218213012400234","DOIUrl":"https://doi.org/10.1142/S0218213012400234","url":null,"abstract":"The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134020105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Claude, A. Fariña, Miguel A. Martínez-Prieto, G. Navarro
The study of compressed storage schemes for highly repetitive sequence collections has been recently boosted by the availability of cheaper sequencing technologies and the flood of data they promise to generate. Such a storage scheme may range from the simple goal of retrieving whole individual sequences to the more advanced one of providing fast searches in the collection. In this paper we study alternatives to implement a particularly popular index, namely, the one able of finding all the positions in the collection of substrings of fixed length ($q$-grams). We introduce two novel techniques and show they constitute practical alternatives to handle this scenario. They excel particularly in two cases: when $q$ is small (up to 6), and when the collection is extremely repetitive (less than 0.01% mutations).
{"title":"Compressed q-Gram Indexing for Highly Repetitive Biological Sequences","authors":"Francisco Claude, A. Fariña, Miguel A. Martínez-Prieto, G. Navarro","doi":"10.1109/BIBE.2010.22","DOIUrl":"https://doi.org/10.1109/BIBE.2010.22","url":null,"abstract":"The study of compressed storage schemes for highly repetitive sequence collections has been recently boosted by the availability of cheaper sequencing technologies and the flood of data they promise to generate. Such a storage scheme may range from the simple goal of retrieving whole individual sequences to the more advanced one of providing fast searches in the collection. In this paper we study alternatives to implement a particularly popular index, namely, the one able of finding all the positions in the collection of substrings of fixed length ($q$-grams). We introduce two novel techniques and show they constitute practical alternatives to handle this scenario. They excel particularly in two cases: when $q$ is small (up to 6), and when the collection is extremely repetitive (less than 0.01% mutations).","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125962392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Similarity between protein sequences is usually predictive of similarity in structures. However, in some rare cases protein domains with significant sequence similarity adopt different structures. Here, we carry out a survey of protein domain pairs with high sequence similarity (measured by HHsearch probability) and low structural similarity (measured by Dali Z-score), aiming to identify the reasons for this discordance. Besides methodological problems with either sequences or structures of domains, we find and describe novel examples of homologs with structural changes.
{"title":"Structural Differences between Proteins with Similar Sequences","authors":"Q. Cong, Bong-Hyun Kim, L. Kinch, N. Grishin","doi":"10.1109/BIBE.2010.48","DOIUrl":"https://doi.org/10.1109/BIBE.2010.48","url":null,"abstract":"Similarity between protein sequences is usually predictive of similarity in structures. However, in some rare cases protein domains with significant sequence similarity adopt different structures. Here, we carry out a survey of protein domain pairs with high sequence similarity (measured by HHsearch probability) and low structural similarity (measured by Dali Z-score), aiming to identify the reasons for this discordance. Besides methodological problems with either sequences or structures of domains, we find and describe novel examples of homologs with structural changes.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130435865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.
{"title":"Identification of Viral Protein Genotypic Determinants Using Combinatorial Filtering and Active Learning","authors":"Chuang Wu, Andrew S. Walsh, R. Rosenfeld","doi":"10.1109/BIBE.2010.25","DOIUrl":"https://doi.org/10.1109/BIBE.2010.25","url":null,"abstract":"RNA viruses such as HIV, Influenza, impose very significant disease burden throughout the world. Identifying key protein residue determinants that affect a given viral phenotype is an important step in learning the genotype-phenotype mapping and making clinic decisions. This identification is currently done through a laborious experimental process which is arguably inefficient, incomplete, and unreliable. We describe a supervised combinatorial filtering algorithm that systematically and efficiently infers the correct set of key residue positions from all available labeled data. We demonstrate its consistency, validate it on a variety of datasets, show the superior power to conventional identification methods, and describe its use under incremental relaxation of constraints. For cases where more data is needed to fully converge to an answer, we introduce an active learning algorithm to help choose the most informative experiment from a set of unlabeled candidate strains or mutagenesis experiments, so as to minimize the expected total laboratory time or financial cost. As an example, we demonstrate the savings afforded by this algorithm in identifying the molecular determinants of fusogenicity from a previously published dataset of Feline Immunodeficiency Virus Envelope proteins.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120951157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In low-dose-rate (LDR) brachytherapy, precise placement of needle and accurate delivery of radioactive seeds are very important and challenging. Therefore, development of robotic systems for performing brachytherapy is gaining significant momentum. In this paper, we present a multichannel robotic system developed for image-guided brachytherapy (IGBT), especially radioactive seed implantation in prostate gland. The developed multichannel robotic system is capable of inserting a large number of needles concurrently and depositing seeds automatically. Numerous techniques perfected by a variety of experiments have been implemented in the system design and development. Thus, this system possesses potential several advantages such as reduction of target displacement, edema, and operating time as compared to single needle insertion technique. Feasibility and efficacy of this robot has been demonstrated with experimental results. Seeds can be delivered with about within 0.2mm accuracy.
{"title":"Multichannel Robot for Image-Guided Brachytherapy","authors":"T. Podder, Ivan Buzurovic, Yan Yu","doi":"10.1109/BIBE.2010.41","DOIUrl":"https://doi.org/10.1109/BIBE.2010.41","url":null,"abstract":"In low-dose-rate (LDR) brachytherapy, precise placement of needle and accurate delivery of radioactive seeds are very important and challenging. Therefore, development of robotic systems for performing brachytherapy is gaining significant momentum. In this paper, we present a multichannel robotic system developed for image-guided brachytherapy (IGBT), especially radioactive seed implantation in prostate gland. The developed multichannel robotic system is capable of inserting a large number of needles concurrently and depositing seeds automatically. Numerous techniques perfected by a variety of experiments have been implemented in the system design and development. Thus, this system possesses potential several advantages such as reduction of target displacement, edema, and operating time as compared to single needle insertion technique. Feasibility and efficacy of this robot has been demonstrated with experimental results. Seeds can be delivered with about within 0.2mm accuracy.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132609454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Bebu, F. Seillier-Moiseiwitsch, Jing Wu, T. Mathew
In microarray experiments, expression profiles are obtained for thousands of genes under several treatments. Traditionally, most of the statistical techniques employed are concentrated around univariate methods. They ignore the inter-gene dependence and do not use any prior biological knowledge. Gene set analysis addresses both these concerns by analyzing together a group of correlated genes, for example genes that share a common biological function, chromosomal location, or regulation. In this paper we propose a multivariate analysis of covariance model (MANCOVA) for gene set analysis with covariates. Principal component analysis (PCA) is used to address the dimensionality problem. The two testing procedures presented are shown to perform well using simulations.
{"title":"Gene Set Analysis with Covariates","authors":"I. Bebu, F. Seillier-Moiseiwitsch, Jing Wu, T. Mathew","doi":"10.1109/BIBE.2010.63","DOIUrl":"https://doi.org/10.1109/BIBE.2010.63","url":null,"abstract":"In microarray experiments, expression profiles are obtained for thousands of genes under several treatments. Traditionally, most of the statistical techniques employed are concentrated around univariate methods. They ignore the inter-gene dependence and do not use any prior biological knowledge. Gene set analysis addresses both these concerns by analyzing together a group of correlated genes, for example genes that share a common biological function, chromosomal location, or regulation. In this paper we propose a multivariate analysis of covariance model (MANCOVA) for gene set analysis with covariates. Principal component analysis (PCA) is used to address the dimensionality problem. The two testing procedures presented are shown to perform well using simulations.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We here analyzed the correlation between the genic GC content and the temperature range conditions of prokaryotes. To identify those genes whose surrounding GC levels exhibit patterns different for organisms under different temperature conditions but transcending phylogenetic boundaries, we first focused on the complete list of organisms, then partial lists of organisms with one phylum being excluded, and finally organisms of the same phylum but of different temperature conditions. To further validate the identified correlation relationships, we examined to what extent the temperature condition of an organism can be predicted based on the GC levels surrounding the selected genes. The overall prediction accuracy was 96.80% if based on the genes derived from the complete list of organisms, and 95.45% if based on the 17 phylum-independent genes. If the phylum-specific genes were used, the prediction accuracy was 90.00% and 95.63% for organisms of the Euryarchaeota and Firmicutes phyla, respectively. These results demonstrated the predictability of the temperature range conditions of prokaryotic organisms based on their genic GC levels surrounding certain genes, as well as the correlation between this particular duo of ecological and genomic traits.
{"title":"Analysis on the Correlation Relationships between the Temperature Range Condition and the Genic GC Content Levels of Prokaryotes","authors":"Hao Zheng, Hongwei Wu","doi":"10.1109/BIBE.2010.56","DOIUrl":"https://doi.org/10.1109/BIBE.2010.56","url":null,"abstract":"We here analyzed the correlation between the genic GC content and the temperature range conditions of prokaryotes. To identify those genes whose surrounding GC levels exhibit patterns different for organisms under different temperature conditions but transcending phylogenetic boundaries, we first focused on the complete list of organisms, then partial lists of organisms with one phylum being excluded, and finally organisms of the same phylum but of different temperature conditions. To further validate the identified correlation relationships, we examined to what extent the temperature condition of an organism can be predicted based on the GC levels surrounding the selected genes. The overall prediction accuracy was 96.80% if based on the genes derived from the complete list of organisms, and 95.45% if based on the 17 phylum-independent genes. If the phylum-specific genes were used, the prediction accuracy was 90.00% and 95.63% for organisms of the Euryarchaeota and Firmicutes phyla, respectively. These results demonstrated the predictability of the temperature range conditions of prokaryotic organisms based on their genic GC levels surrounding certain genes, as well as the correlation between this particular duo of ecological and genomic traits.","PeriodicalId":330904,"journal":{"name":"2010 IEEE International Conference on BioInformatics and BioEngineering","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115384977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}