By employing the Optimal Bayesian Robust (OBR) policy, Bayesian Markov Decision Process (BMDP) can be used to solve the Gene Regulatory Network (GRN) control problem. However, due to the "curse of dimensionality", the data storage limitation hinders the practical applicability of the BMDP. To overcome this impediment, we propose a novel Duplex Sparse Storage (DSS) scheme in this paper, and develop a BMDP solver with the DSS scheme on a heterogeneous GPU-based platform. The simulation results demonstrate that our approach achieves a 5x reduction in memory utilization with a 2.4% "decision difference" and an average speedup of 4.1x compared to the full matrix based storage scheme. Additionally, we present the tradeoff between the runtime and result accuracy for our DSS techniques versus the full matrix approach. We also compare our results with the well known Compressed Sparse Row (CSR) approach for reducing memory utilization, and discuss the benefits of DSS over CSR.
{"title":"Fast and Highly Scalable Bayesian MDP on a GPU Platform","authors":"He Zhou, S. Khatri, Jiang Hu, Frank Liu, C. Sze","doi":"10.1145/3107411.3107440","DOIUrl":"https://doi.org/10.1145/3107411.3107440","url":null,"abstract":"By employing the Optimal Bayesian Robust (OBR) policy, Bayesian Markov Decision Process (BMDP) can be used to solve the Gene Regulatory Network (GRN) control problem. However, due to the \"curse of dimensionality\", the data storage limitation hinders the practical applicability of the BMDP. To overcome this impediment, we propose a novel Duplex Sparse Storage (DSS) scheme in this paper, and develop a BMDP solver with the DSS scheme on a heterogeneous GPU-based platform. The simulation results demonstrate that our approach achieves a 5x reduction in memory utilization with a 2.4% \"decision difference\" and an average speedup of 4.1x compared to the full matrix based storage scheme. Additionally, we present the tradeoff between the runtime and result accuracy for our DSS techniques versus the full matrix approach. We also compare our results with the well known Compressed Sparse Row (CSR) approach for reducing memory utilization, and discuss the benefits of DSS over CSR.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116878503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Searching for a cure for cancer is one of the most vital pursuits in modern medicine. In that aspect microRNA research plays a key role. Keeping track of the shifts and changes in established knowledge in the microRNA domain is very important. In this paper, we introduce an Ontology-Based Information Extraction method to detect occurrences of inconsistencies in microRNA research paper abstracts. We propose a method to first use the Ontology for MIcroRNA Targets (OMIT) to extract triples from the abstracts. Then we introduce a new algorithm to calculate the oppositeness of these candidate relationships. Finally we present the discovered inconsistencies in an easy to read manner to be used by medical professionals. To our best knowledge, this study is the first ontology-based information extraction model introduced to find shifts in the established knowledge in the medical domain using research paper abstracts. We downloaded 36877 abstracts from the PubMed database. From those, we found 102 inconsistencies relevant to the microRNA domain.
寻找治疗癌症的方法是现代医学最重要的追求之一。在这方面,microRNA的研究起着关键作用。跟踪microRNA领域已有知识的变化是非常重要的。在本文中,我们引入了一种基于本体的信息提取方法来检测microRNA研究论文摘要中不一致的情况。我们提出了一种方法,首先使用Ontology for MIcroRNA Targets (OMIT)从摘要中提取三元组。然后,我们引入了一种新的算法来计算这些候选关系的对立面。最后,我们以一种易于阅读的方式呈现所发现的不一致,以供医学专业人员使用。据我们所知,这项研究是第一个引入基于本体的信息提取模型,利用研究论文摘要来发现医学领域已建立知识的变化。我们从PubMed数据库下载了36877篇摘要。从中,我们发现了102个与microRNA结构域相关的不一致之处。
{"title":"Discovering Inconsistencies in PubMed Abstracts through Ontology-Based Information Extraction","authors":"Nisansa de Silva, D. Dou, Jingshan Huang","doi":"10.1145/3107411.3107452","DOIUrl":"https://doi.org/10.1145/3107411.3107452","url":null,"abstract":"Searching for a cure for cancer is one of the most vital pursuits in modern medicine. In that aspect microRNA research plays a key role. Keeping track of the shifts and changes in established knowledge in the microRNA domain is very important. In this paper, we introduce an Ontology-Based Information Extraction method to detect occurrences of inconsistencies in microRNA research paper abstracts. We propose a method to first use the Ontology for MIcroRNA Targets (OMIT) to extract triples from the abstracts. Then we introduce a new algorithm to calculate the oppositeness of these candidate relationships. Finally we present the discovered inconsistencies in an easy to read manner to be used by medical professionals. To our best knowledge, this study is the first ontology-based information extraction model introduced to find shifts in the established knowledge in the medical domain using research paper abstracts. We downloaded 36877 abstracts from the PubMed database. From those, we found 102 inconsistencies relevant to the microRNA domain.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126134180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Individuals of a species have similar characteristics but they are rarely identical because of the genomic variations. One of the important genomic variations is structural variation (SV), including copy number variation (CNV), which is a result of amplifications or deletions of genomic regions. It has been shown that SV plays an important role in phenotypic diversity and evolution. A Genome encompasses other aberrations such as Single Nucleotide Polymorphism (SNP) and small insertions and deletions (Indels). Although genetic variations contribute to our uniqueness, they can comprise critical developmental genes leading to gene dosage imbalances, new genes creation, and gene structures reshaping that ultimately may result in disease. Understanding the mechanisms of structural variation formation helps us better understand human phenotypic diversity, evolution and diseases susceptibility. Computational tools have been developed for genomic variation detection using next-generation sequencing (NGS) data. However, with no prior knowledge about variants in real samples, the tools that are used for detection and analysis have been hindered by the lack of a gold standard benchmark. Some multi-variant simulators have been developed for whole genome sequencing (WGS) data such as SInC and SCNVSim. However, they are not easy to use and technical skills are required to run them. Moreover, those simulators only apply genomic variations to a reference file; and other software tools, such as ART simulator, need to be used to generate the sequenced short reads. We have developed a user-friendly automated pipeline, VarSimLab, which offers an integrated web-based suite to simulate structural variations and also to generate WGS and WES short reads. It utilizes some of the existing tools and packages them into a standard Docker image; an open source technology used to package applications and their dependencies into a standardized software container. VarSimLab automates the process of simulating tumor genotypes such as SNPs, Indels, CNVs, transition/transversion, ploidy and tumor sub-clone and generating short reads. Thanks to the Docker technology, the pipeline is platform-independent and super easy for non-technical scientists to use from a web browser. VarSimLab is designed to grow as a full suite of integrated tools to analyze genomic aberrations.
{"title":"Varsimlab: A Docker-based Pipeline to Automatically Synthesize Short Reads with Genomic Aberrations","authors":"Abdelrahman Hosny, Fatima Zare, S. Nabavi","doi":"10.1145/3107411.3108188","DOIUrl":"https://doi.org/10.1145/3107411.3108188","url":null,"abstract":"Individuals of a species have similar characteristics but they are rarely identical because of the genomic variations. One of the important genomic variations is structural variation (SV), including copy number variation (CNV), which is a result of amplifications or deletions of genomic regions. It has been shown that SV plays an important role in phenotypic diversity and evolution. A Genome encompasses other aberrations such as Single Nucleotide Polymorphism (SNP) and small insertions and deletions (Indels). Although genetic variations contribute to our uniqueness, they can comprise critical developmental genes leading to gene dosage imbalances, new genes creation, and gene structures reshaping that ultimately may result in disease. Understanding the mechanisms of structural variation formation helps us better understand human phenotypic diversity, evolution and diseases susceptibility. Computational tools have been developed for genomic variation detection using next-generation sequencing (NGS) data. However, with no prior knowledge about variants in real samples, the tools that are used for detection and analysis have been hindered by the lack of a gold standard benchmark. Some multi-variant simulators have been developed for whole genome sequencing (WGS) data such as SInC and SCNVSim. However, they are not easy to use and technical skills are required to run them. Moreover, those simulators only apply genomic variations to a reference file; and other software tools, such as ART simulator, need to be used to generate the sequenced short reads. We have developed a user-friendly automated pipeline, VarSimLab, which offers an integrated web-based suite to simulate structural variations and also to generate WGS and WES short reads. It utilizes some of the existing tools and packages them into a standard Docker image; an open source technology used to package applications and their dependencies into a standardized software container. VarSimLab automates the process of simulating tumor genotypes such as SNPs, Indels, CNVs, transition/transversion, ploidy and tumor sub-clone and generating short reads. Thanks to the Docker technology, the pipeline is platform-independent and super easy for non-technical scientists to use from a web browser. VarSimLab is designed to grow as a full suite of integrated tools to analyze genomic aberrations.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116098470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex V. Kotlar, Cristina E. Trevino, M. Zwick, D. Cutler, T. Wingo
Describing, prioritizing, and selecting alleles from large sequencing experiments remains technically challenging. SeqAnt (https://seqant.emory.edu) is the first online, cloud-based application that makes these tasks accessible for non-programmers, even for terabyte-sized experiments containing thousands of whole-genome samples. It rapidly describes the alleles found within submitted VCF files, and then indexes the results in a natural-language search engine, which enables users to locate alleles of interest in milliseconds using normal English phrases. Our results show that SeqAnt decreases processing time by orders of magnitude and that its search engine can be used to precisely identify alleles by phenotype, genomic structure, and population genetics characteristics.
{"title":"SeqAnt: Cloud-Based Whole-Genome Annotation and Search","authors":"Alex V. Kotlar, Cristina E. Trevino, M. Zwick, D. Cutler, T. Wingo","doi":"10.1145/3107411.3108231","DOIUrl":"https://doi.org/10.1145/3107411.3108231","url":null,"abstract":"Describing, prioritizing, and selecting alleles from large sequencing experiments remains technically challenging. SeqAnt (https://seqant.emory.edu) is the first online, cloud-based application that makes these tasks accessible for non-programmers, even for terabyte-sized experiments containing thousands of whole-genome samples. It rapidly describes the alleles found within submitted VCF files, and then indexes the results in a natural-language search engine, which enables users to locate alleles of interest in milliseconds using normal English phrases. Our results show that SeqAnt decreases processing time by orders of magnitude and that its search engine can be used to precisely identify alleles by phenotype, genomic structure, and population genetics characteristics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116110599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 14: Integrative Methods for Genomic Data","authors":"M. Masseroli","doi":"10.1145/3254557","DOIUrl":"https://doi.org/10.1145/3254557","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122647600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruna Jacobson, Jon Christian L. David, Mitchell C. Malone, Kasra Manavi, S. Atlas, Lydia Tapia
The motor protein kinesin is a remarkable natural nanobot that moves cellular cargo by taking 8 nm steps along a microtubule molecular highway. Understanding kinesin's mechanism of operation continues to present considerable modeling challenges, primarily due to the millisecond timescale of its motion, which prohibits fully atomistic simulations. Here we describe the first phase of a physics-based approach that combines energetic information from all-atom modeling with a robotic framework to enable kinetic access to longer simulation timescales. Starting from experimental PDB structures, we have designed a computational model of the combined kinesin-microtubule system represented by the isosurface of an all-atom model. We use motion planning techniques originally developed for robotics to generate candidate conformations of the kinesin head with respect to the microtubule, considering all six degrees of freedom of the molecular walker's catalytic domain. This efficient sampling technique, combined with all-atom energy calculations of the kinesin-microtubule system, allows us to explore the configuration space in the vicinity of the kinesin binding site on the microtubule. We report initial results characterizing the energy landscape of the kinesin-microtubule system, setting the stage for an efficient, graph-based exploration of kinesin preferential binding and dynamics on the microtubule, including interactions with obstacles.
{"title":"Geometric Sampling Framework for Exploring Molecular Walker Energetics and Dynamics","authors":"Bruna Jacobson, Jon Christian L. David, Mitchell C. Malone, Kasra Manavi, S. Atlas, Lydia Tapia","doi":"10.1145/3107411.3107503","DOIUrl":"https://doi.org/10.1145/3107411.3107503","url":null,"abstract":"The motor protein kinesin is a remarkable natural nanobot that moves cellular cargo by taking 8 nm steps along a microtubule molecular highway. Understanding kinesin's mechanism of operation continues to present considerable modeling challenges, primarily due to the millisecond timescale of its motion, which prohibits fully atomistic simulations. Here we describe the first phase of a physics-based approach that combines energetic information from all-atom modeling with a robotic framework to enable kinetic access to longer simulation timescales. Starting from experimental PDB structures, we have designed a computational model of the combined kinesin-microtubule system represented by the isosurface of an all-atom model. We use motion planning techniques originally developed for robotics to generate candidate conformations of the kinesin head with respect to the microtubule, considering all six degrees of freedom of the molecular walker's catalytic domain. This efficient sampling technique, combined with all-atom energy calculations of the kinesin-microtubule system, allows us to explore the configuration space in the vicinity of the kinesin binding site on the microtubule. We report initial results characterizing the energy landscape of the kinesin-microtubule system, setting the stage for an efficient, graph-based exploration of kinesin preferential binding and dynamics on the microtubule, including interactions with obstacles.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114184719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longitudinal studies are widely used in medicine, biology, population health and other areas related to bioinformatics. A broad spectrum of methods for joint analysis of longitudinal and time-to-event (survival) data has been proposed the in last few decades. The Stochastic process model (SPM) represents one possible framework for modelling joint evolution of repeatedly measured variables and time-to-event outcome typically observed in longitudinal studies. SPM is applicable for analyses of longitudinal data in many research areas such as demography and medicine and allows researchers to utilize the full potential of longitudinal data by evaluating dynamic mechanisms of changing physiological variables with time (age), allowing the study of differences, for example, in genotype-specific hazards. SPM allows incorporation of available knowledge about regularities of aging-related changes in the human body for addressing fundamental problems of changes in resilience and physiological norms. It permits evaluating mechanisms that indirectly affect longitudinal trajectories of physiological variables using data on mortality or onset of diseases. In this tutorial we explain the basic concepts of SPM, its current state and possible applications, corresponding software tools and show practical examples of analysis of joint analysis of longitudinal and time-to-event data with this methodology.
{"title":"Stochastic Process Model and Its Applications to Analysis of Longitudinal Data","authors":"I. Zhbannikov, K. Arbeev","doi":"10.1145/3107411.3107496","DOIUrl":"https://doi.org/10.1145/3107411.3107496","url":null,"abstract":"Longitudinal studies are widely used in medicine, biology, population health and other areas related to bioinformatics. A broad spectrum of methods for joint analysis of longitudinal and time-to-event (survival) data has been proposed the in last few decades. The Stochastic process model (SPM) represents one possible framework for modelling joint evolution of repeatedly measured variables and time-to-event outcome typically observed in longitudinal studies. SPM is applicable for analyses of longitudinal data in many research areas such as demography and medicine and allows researchers to utilize the full potential of longitudinal data by evaluating dynamic mechanisms of changing physiological variables with time (age), allowing the study of differences, for example, in genotype-specific hazards. SPM allows incorporation of available knowledge about regularities of aging-related changes in the human body for addressing fundamental problems of changes in resilience and physiological norms. It permits evaluating mechanisms that indirectly affect longitudinal trajectories of physiological variables using data on mortality or onset of diseases. In this tutorial we explain the basic concepts of SPM, its current state and possible applications, corresponding software tools and show practical examples of analysis of joint analysis of longitudinal and time-to-event data with this methodology.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121925914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network similarity ranking attempts to rank a given set of networks based on its "similarity" to a reference network. State-of-the-art approaches tend to be general in the sense that they can be applied to networks in a variety of domains. Consequently, they are not designed to exploit domain-specific knowledge to find similar networks although such knowledge may yield interesting insights that are unique to specific problems, paving the way to solutions that are more effective. We propose Tintin which uses a novel target feature-based network similarity distance for ranking similar signaling networks. In contrast to state-of-the-art network similarity techniques, Tintin considers both topological and dynamic features in order to compute network similarity. Our empirical study on signaling networks from BioModels with real-world curated outcomes reveals that Tintin ranking is different from state-of-the-art approaches.
{"title":"TINTIN: Exploiting Target Features for Signaling Network Similarity Computation and Ranking","authors":"Huey-Eng Chua, S. Bhowmick, L. Tucker-Kellogg","doi":"10.1145/3107411.3107470","DOIUrl":"https://doi.org/10.1145/3107411.3107470","url":null,"abstract":"Network similarity ranking attempts to rank a given set of networks based on its \"similarity\" to a reference network. State-of-the-art approaches tend to be general in the sense that they can be applied to networks in a variety of domains. Consequently, they are not designed to exploit domain-specific knowledge to find similar networks although such knowledge may yield interesting insights that are unique to specific problems, paving the way to solutions that are more effective. We propose Tintin which uses a novel target feature-based network similarity distance for ranking similar signaling networks. In contrast to state-of-the-art network similarity techniques, Tintin considers both topological and dynamic features in order to compute network similarity. Our empirical study on signaling networks from BioModels with real-world curated outcomes reveals that Tintin ranking is different from state-of-the-art approaches.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bertrand Miannay, S. Minvielle, O. Roux, F. Magrangeas, Carito Guziolowski
The integration of gene expression profiles (GEPs) and large-scale biological networks derived from Pathways Databases is a subject which is being widely explored. Existing methods are based on network distance measures among significantly measured species. Only a small number of them include the directionality and underlying logic existing in biological networks. In this study we approach the GEP-networks integration problem by considering the network logic but our approach does not require a prior species selection according to their gene expression level. We start by modeling the biological network representing its underlying logic using Logic Programming. This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. Only then, we confront these network states with the GEP. From this analysis independent graph components are derived, each of them related to a fixed and optimal assignment of active or inactive states. These components allow us to decompose a large-scale network into subgraphs and their molecular species state assignments have different degrees of similarity when compared to the same GEP. We applied our method to study the set of possible states derived from a subgraph from the NCI-PID Pathway Interaction Database. This graph linked Multiple Myeloma (MM) genes to known receptors for this blood cancer.
{"title":"Constraints On Signaling Networks Logic Reveal Functional Subgraphs On Multiple Myeloma OMIC Data","authors":"Bertrand Miannay, S. Minvielle, O. Roux, F. Magrangeas, Carito Guziolowski","doi":"10.1145/3107411.3110411","DOIUrl":"https://doi.org/10.1145/3107411.3110411","url":null,"abstract":"The integration of gene expression profiles (GEPs) and large-scale biological networks derived from Pathways Databases is a subject which is being widely explored. Existing methods are based on network distance measures among significantly measured species. Only a small number of them include the directionality and underlying logic existing in biological networks. In this study we approach the GEP-networks integration problem by considering the network logic but our approach does not require a prior species selection according to their gene expression level. We start by modeling the biological network representing its underlying logic using Logic Programming. This model points to reachable network discrete states that maximize a notion of harmony between the molecular species active or inactive possible states and the directionality of the pathways reactions according to their activator or inhibitor control role. Only then, we confront these network states with the GEP. From this analysis independent graph components are derived, each of them related to a fixed and optimal assignment of active or inactive states. These components allow us to decompose a large-scale network into subgraphs and their molecular species state assignments have different degrees of similarity when compared to the same GEP. We applied our method to study the set of possible states derived from a subgraph from the NCI-PID Pathway Interaction Database. This graph linked Multiple Myeloma (MM) genes to known receptors for this blood cancer.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proteins are dynamic biomolecules. A structure-by-structure characterization of a protein's transition between two different functional structures is central to elucidating the role of dynamics in modulating protein function and designing therapeutic drugs. Characterizing transitions challenges both dry and wet laboratories. Some computational methods compute discrete representations of the energy landscape that organizes structures of a protein by their potential energies. The representations support queries for paths (series of structures) connecting start and goal structures of interest. Here we address the problem of modeling protein structural transitions under the umbrella of stochastic optimization and propose a novel evolutionary algorithm (EA). The EA evolves paths without reconstructing the energy landscape, addressing two competing optimization objectives, energetic cost and structural resolution. Rather than seek one path, the EA yields an ensemble of paths to represent a transition. Preliminary applications suggest the EA is effective while operating under a reasonable computational budget.
{"title":"Evolving Conformation Paths to Model Protein Structural Transitions","authors":"Emmanuel Sapin, K. D. Jong, Amarda Shehu","doi":"10.1145/3107411.3107498","DOIUrl":"https://doi.org/10.1145/3107411.3107498","url":null,"abstract":"Proteins are dynamic biomolecules. A structure-by-structure characterization of a protein's transition between two different functional structures is central to elucidating the role of dynamics in modulating protein function and designing therapeutic drugs. Characterizing transitions challenges both dry and wet laboratories. Some computational methods compute discrete representations of the energy landscape that organizes structures of a protein by their potential energies. The representations support queries for paths (series of structures) connecting start and goal structures of interest. Here we address the problem of modeling protein structural transitions under the umbrella of stochastic optimization and propose a novel evolutionary algorithm (EA). The EA evolves paths without reconstructing the energy landscape, addressing two competing optimization objectives, energetic cost and structural resolution. Rather than seek one path, the EA yields an ensemble of paths to represent a transition. Preliminary applications suggest the EA is effective while operating under a reasonable computational budget.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114068595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}