Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822480
Shaoliang Peng
Extremely powerful computers are needed to help scientists to handle high performance computational biology and drug design problems. The world's largest genomics institute BGI currently generates 6 TB data each day. The European Bioinformatics Institute (EBI) in Hinxton currently stores 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups about genes, proteins and small molecules. TianHe supercomputers can speed up computational biology and drug design processing. In 2013, 2014, and 2015, Tianhe-2 topped the TOP500 list of fastest supercomputers in the world. Many well-known bioinformatics and drug design softwares (BWA, DOCK, SOAP3-dp, SOAPdenovo, SOAPsnp etc.) are developed and running on TH-2.
{"title":"High performance computational biology and drug design on TianHe Supercomputers","authors":"Shaoliang Peng","doi":"10.1109/BIBM.2016.7822480","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822480","url":null,"abstract":"Extremely powerful computers are needed to help scientists to handle high performance computational biology and drug design problems. The world's largest genomics institute BGI currently generates 6 TB data each day. The European Bioinformatics Institute (EBI) in Hinxton currently stores 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups about genes, proteins and small molecules. TianHe supercomputers can speed up computational biology and drug design processing. In 2013, 2014, and 2015, Tianhe-2 topped the TOP500 list of fastest supercomputers in the world. Many well-known bioinformatics and drug design softwares (BWA, DOCK, SOAP3-dp, SOAPdenovo, SOAPsnp etc.) are developed and running on TH-2.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"117 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87084701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822479
Sun Kim
These days, genome-wide measurements of genetic and epigenetics events, a.k.a omics data, are routinely produced; epigenetics is control mechanisms of genetics events as epi-means ‘on’ or ‘upon’. As a result, a huge amount of omics data measured from different genetic and epigenetic events are available. For example, the amount of data at The Cancer Genome Atlas(TCGA) alone exceeds 2.5 peta byte as of October 2016. Unfortunately, the dimensions of omics data is huge, typically tens to hundreds or even millions of thousands while the number of samples are limited typically a few to thousands. Thus mining genetic and epigenetic data measured in different phenotype conditions is a very challenging problem, that is, small data sets on extremely high dimensions. Furthermore, all genetic and epigenetic events are inter-related. Thus it is necessary to perform integrated analysis of omics data sets of different types, which is even more challenging. To address these technical challenges, the bioinformatics community has used virtually all known network based analysis techniques, including recently developed deep neural networks. My group has been trying the network based integrated analysis of omics data at three different levels. First, we have been investigating on computational methods for associating different genetic and epigenetic events, which can be viewed as methods for defining edges in the network. Second, we have been developing mining subnetworks on the phenotype and time dimensions. Third, we have recently begun to investigate on the use of deep learning techniques for the integrated analysis of omics data. An important goal of our research is to combine network analysis and deep learning techniques to construct models or draw maps of cancer cells at multiple levels such as genomic mutations, gene activation/suppressions, epigenetic events including DNA methylation, histone modifications, and miRNA interference, biological pathways, and finally at the whole cell level including tumor heterogeneity and clonal evolution.
{"title":"Networks and models for the integrated analysis of multi omics data","authors":"Sun Kim","doi":"10.1109/BIBM.2016.7822479","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822479","url":null,"abstract":"These days, genome-wide measurements of genetic and epigenetics events, a.k.a omics data, are routinely produced; epigenetics is control mechanisms of genetics events as epi-means ‘on’ or ‘upon’. As a result, a huge amount of omics data measured from different genetic and epigenetic events are available. For example, the amount of data at The Cancer Genome Atlas(TCGA) alone exceeds 2.5 peta byte as of October 2016. Unfortunately, the dimensions of omics data is huge, typically tens to hundreds or even millions of thousands while the number of samples are limited typically a few to thousands. Thus mining genetic and epigenetic data measured in different phenotype conditions is a very challenging problem, that is, small data sets on extremely high dimensions. Furthermore, all genetic and epigenetic events are inter-related. Thus it is necessary to perform integrated analysis of omics data sets of different types, which is even more challenging. To address these technical challenges, the bioinformatics community has used virtually all known network based analysis techniques, including recently developed deep neural networks. My group has been trying the network based integrated analysis of omics data at three different levels. First, we have been investigating on computational methods for associating different genetic and epigenetic events, which can be viewed as methods for defining edges in the network. Second, we have been developing mining subnetworks on the phenotype and time dimensions. Third, we have recently begun to investigate on the use of deep learning techniques for the integrated analysis of omics data. An important goal of our research is to combine network analysis and deep learning techniques to construct models or draw maps of cancer cells at multiple levels such as genomic mutations, gene activation/suppressions, epigenetic events including DNA methylation, histone modifications, and miRNA interference, biological pathways, and finally at the whole cell level including tumor heterogeneity and clonal evolution.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"38 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76070749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822478
Jun Huan
In recent years, research in Artificial Neural Networks (ANNs) has resurged, now under the Deep-Learning umbrella, and grown extremely popular due to major breakthroughs in methodological and computing capabilities. Deep-Learning methods are part of representation-learning algorithms that attempt to extract and organize discriminative information from the data. Recently reported success of DL techniques in crowd-sourced QSARs and predictive toxicology competitions has showcased these methods as powerful tools for drug-discovery and toxicology research. Nevertheless, reported applications of Deep Learning techniques for modeling complex bioactivity data for small molecules remain still limited. In this talk I will present our recent work on optimizing feed-forward Deep Neural Nets (DNNs) hyperparameters and performance evaluation of these methods as compared to shallow methods. In our study 48 DNNs, 24 Random Forest, 20 SVM and 6 Naive Bayes arbitrary but reasonably selected configurations were compared employing 7 diverse bioactivity datasets assembled from ChEMBL repository combined with circular fingerprints as molecular descriptors. The non-parametric Wilcoxon paired singed-rank test was employed to compare the performance of DNN to RF, SVM and NB. Overall it was found that DNNs with 2 hidden layers, 2,000 neurons per each hidden layer, ReLU activation function and Dropout regularization technique achieved strong classification performance across all tested datasets. Our results demonstrate that DNNs are powerful modeling techniques for modeling complex bioactivity data.
{"title":"Deep-Learning: Investigating feed-forward deep Neural Networks for modeling high throughput chemical bioactivity data","authors":"Jun Huan","doi":"10.1109/BIBM.2016.7822478","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822478","url":null,"abstract":"In recent years, research in Artificial Neural Networks (ANNs) has resurged, now under the Deep-Learning umbrella, and grown extremely popular due to major breakthroughs in methodological and computing capabilities. Deep-Learning methods are part of representation-learning algorithms that attempt to extract and organize discriminative information from the data. Recently reported success of DL techniques in crowd-sourced QSARs and predictive toxicology competitions has showcased these methods as powerful tools for drug-discovery and toxicology research. Nevertheless, reported applications of Deep Learning techniques for modeling complex bioactivity data for small molecules remain still limited. In this talk I will present our recent work on optimizing feed-forward Deep Neural Nets (DNNs) hyperparameters and performance evaluation of these methods as compared to shallow methods. In our study 48 DNNs, 24 Random Forest, 20 SVM and 6 Naive Bayes arbitrary but reasonably selected configurations were compared employing 7 diverse bioactivity datasets assembled from ChEMBL repository combined with circular fingerprints as molecular descriptors. The non-parametric Wilcoxon paired singed-rank test was employed to compare the performance of DNN to RF, SVM and NB. Overall it was found that DNNs with 2 hidden layers, 2,000 neurons per each hidden layer, ReLU activation function and Dropout regularization technique achieved strong classification performance across all tested datasets. Our results demonstrate that DNNs are powerful modeling techniques for modeling complex bioactivity data.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"32 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74922522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822474
Bin Hu
Computational psychophysiology is a new direction that broadens the field of psychophysiology by allowing for the identification and integration of multimodal signals to test specific models of mental states and psychological processes. Additionally, such approaches allows for the extraction of multiple signals from large-scale multidimensional data, with a greater ability to differentiate signals embedded in background noise. Further, these approaches allows for a better understanding of the complex psychophysiological processes underlying brain disorders such as autism spectrum disorder, depression, and anxiety. Given the widely acknowledged limitations of psychiatric nosology and the limited treatment options available, new computational models may provide the basis for a multidimensional diagnostic system and potentially new treatment approaches.
{"title":"Computational psychophysiology based research methodology for mental health","authors":"Bin Hu","doi":"10.1109/BIBM.2016.7822474","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822474","url":null,"abstract":"Computational psychophysiology is a new direction that broadens the field of psychophysiology by allowing for the identification and integration of multimodal signals to test specific models of mental states and psychological processes. Additionally, such approaches allows for the extraction of multiple signals from large-scale multidimensional data, with a greater ability to differentiate signals embedded in background noise. Further, these approaches allows for a better understanding of the complex psychophysiological processes underlying brain disorders such as autism spectrum disorder, depression, and anxiety. Given the widely acknowledged limitations of psychiatric nosology and the limited treatment options available, new computational models may provide the basis for a multidimensional diagnostic system and potentially new treatment approaches.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"119 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77955451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822481
Habtom W. Resson
Omic technologies offer the opportunity to characterize liver cancer at various molecular levels. In particular, characterizing the association of biomolecules such as metabolites and glycoproteins with liver cancer is a promising strategy to discover clinically relevant biomarkers. Metabolites are molecular fingerprints of what cells do at a particular point in time; they can reveal early signs of cancers when the chances for cure are highest. Also, the analysis of protein glycosylation is relevant to liver pathology because of the major influence of this organ on the homeostasis of blood glycoproteins. This talk will focus on the application of multi-omic approaches to identify biomarkers for early detection of liver cancer in patients with liver cirrhosis. Specifically, I will present transcriptomic, proteomic, glycomic/glycoproteomic, and metabolomic (TPGM) studies we conducted by analysis of samples from HCC cases and cirrhotic controls using multiple omic platforms such as next generation sequencing, liquid chromatography-mass spectrometry (LC-MS), and gas chromatography-mass spectrometry (GC-MS). In addition to candidate biomarkers discovered by evaluating the changes in the levels of transcripts, proteins, glycans, and metabolites between HCC cases and cirrhotic controls, I will present network-based methods we developed for integrative analysis of multi-omic data to identify aberrant pathways/network activities and biomarkers for early detection of liver cancer.
{"title":"Multi-omic approaches for liver cancer biomarker discovery","authors":"Habtom W. Resson","doi":"10.1109/BIBM.2016.7822481","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822481","url":null,"abstract":"Omic technologies offer the opportunity to characterize liver cancer at various molecular levels. In particular, characterizing the association of biomolecules such as metabolites and glycoproteins with liver cancer is a promising strategy to discover clinically relevant biomarkers. Metabolites are molecular fingerprints of what cells do at a particular point in time; they can reveal early signs of cancers when the chances for cure are highest. Also, the analysis of protein glycosylation is relevant to liver pathology because of the major influence of this organ on the homeostasis of blood glycoproteins. This talk will focus on the application of multi-omic approaches to identify biomarkers for early detection of liver cancer in patients with liver cirrhosis. Specifically, I will present transcriptomic, proteomic, glycomic/glycoproteomic, and metabolomic (TPGM) studies we conducted by analysis of samples from HCC cases and cirrhotic controls using multiple omic platforms such as next generation sequencing, liquid chromatography-mass spectrometry (LC-MS), and gas chromatography-mass spectrometry (GC-MS). In addition to candidate biomarkers discovered by evaluating the changes in the levels of transcripts, proteins, glycans, and metabolites between HCC cases and cirrhotic controls, I will present network-based methods we developed for integrative analysis of multi-omic data to identify aberrant pathways/network activities and biomarkers for early detection of liver cancer.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"39 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/BIBM.2016.7822485
H. Zenil
Despite extensive attempts to characterize systems and networks based upon metrics drawn from traditional statistics, Shannon entropy, and graph theory to understand systems and networks to reveal their causal mechanisms without making too many unjustified assumptions remains still as one of the greatest challenges in complexity science and science in general, specially beyond traditional statistics and so-called machine learning. Knowing the causal mechanisms that govern a system allows not only the prediction of the system's behavior but the manipulation and controlled reprogramming of the system. Here we introduce a formal interventional calculus based upon universal principles drawn from the theory of computability and algorithmic probability, thereby enabling better approaches to the question of causal discovery. By performing sequences of fully controlled perturbations, changes in the algorithmic content of a system can be classified into the effects they have according to their shift towards or away from algorithmic randomness, thereby inducing a ranking of system's elements. This spectral dimension unmasks an algorithmic separation between components conditioned upon the perturbations and endowing us with a suite of powerful parameter-free algorithms to reprogram the system's underlying program. The predictive and explanatory power of these novel conceptual tools are introduced and numerical experiments are illustrated on various types of networks. We show how the algorithmic content of a network is connected to its possible dynamics and how the instant variation of the sensitivity, depth, and the number of attractors in a network is accessible by an analysis of its algorithmic information landscape. The results demonstrate how to unveil causal mechanisms to infer essential properties, including the dynamics of evolving networks. We introduce measures and methods for system reprogrammability even with no, or limited, access to the system kinetic equations or probability distributions. We expect this interventional calculus to be broadly applicable for predictive causal interventions and we anticipate it to be instrumental in the challenge of causality discovery in science from complex data.
{"title":"An algorithmic-information calculus for reprogramming biological networks","authors":"H. Zenil","doi":"10.1109/BIBM.2016.7822485","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822485","url":null,"abstract":"Despite extensive attempts to characterize systems and networks based upon metrics drawn from traditional statistics, Shannon entropy, and graph theory to understand systems and networks to reveal their causal mechanisms without making too many unjustified assumptions remains still as one of the greatest challenges in complexity science and science in general, specially beyond traditional statistics and so-called machine learning. Knowing the causal mechanisms that govern a system allows not only the prediction of the system's behavior but the manipulation and controlled reprogramming of the system. Here we introduce a formal interventional calculus based upon universal principles drawn from the theory of computability and algorithmic probability, thereby enabling better approaches to the question of causal discovery. By performing sequences of fully controlled perturbations, changes in the algorithmic content of a system can be classified into the effects they have according to their shift towards or away from algorithmic randomness, thereby inducing a ranking of system's elements. This spectral dimension unmasks an algorithmic separation between components conditioned upon the perturbations and endowing us with a suite of powerful parameter-free algorithms to reprogram the system's underlying program. The predictive and explanatory power of these novel conceptual tools are introduced and numerical experiments are illustrated on various types of networks. We show how the algorithmic content of a network is connected to its possible dynamics and how the instant variation of the sensitivity, depth, and the number of attractors in a network is accessible by an analysis of its algorithmic information landscape. The results demonstrate how to unveil causal mechanisms to infer essential properties, including the dynamics of evolving networks. We introduce measures and methods for system reprogrammability even with no, or limited, access to the system kinetic equations or probability distributions. We expect this interventional calculus to be broadly applicable for predictive causal interventions and we anticipate it to be instrumental in the challenge of causality discovery in science from complex data.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"87 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83781674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-01-01DOI: 10.1109/BIBM.2016.7822712
D. Zhu, Xueping Li, Zhaoxia Xu, Yiqin Wang, Jin Xu
{"title":"The overview of research progress of the relationship between HBP and inspection information","authors":"D. Zhu, Xueping Li, Zhaoxia Xu, Yiqin Wang, Jin Xu","doi":"10.1109/BIBM.2016.7822712","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822712","url":null,"abstract":"","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"67 1","pages":"1341-1345"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80241920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-11-09DOI: 10.1109/BIBM.2015.7359645
Mihai Pop
Millions of bacteria make our bodies their home. They help keep us healthy, and disruptions in the normal microbiota are believed to contribute to a number of diseases. Cost-effective sequencing technologies have made it possible to sequence the genomes of human-associated microbial communities, leading to the birth of a new scientific discipline - metagenomics. Analyzing the resulting data, however, poses significant computational challenges, in part due to the sheer size of the data-sets, and in part due to the fact that most of the existing computational framework has been established for single organisms. In my talk I will outline several analytical challenges posed by metagenomic applications, and will describe recent results from my lab in the development of tools for analyzing metagenomic data. In particular I will discuss insights from our analysis of diarrheal disease in developing countries, as well as the effective use of co-abundance approaches for linking together data from two large metagenomic studies.
{"title":"Computational challenges in microbiome research","authors":"Mihai Pop","doi":"10.1109/BIBM.2015.7359645","DOIUrl":"https://doi.org/10.1109/BIBM.2015.7359645","url":null,"abstract":"Millions of bacteria make our bodies their home. They help keep us healthy, and disruptions in the normal microbiota are believed to contribute to a number of diseases. Cost-effective sequencing technologies have made it possible to sequence the genomes of human-associated microbial communities, leading to the birth of a new scientific discipline - metagenomics. Analyzing the resulting data, however, poses significant computational challenges, in part due to the sheer size of the data-sets, and in part due to the fact that most of the existing computational framework has been established for single organisms. In my talk I will outline several analytical challenges posed by metagenomic applications, and will describe recent results from my lab in the development of tools for analyzing metagenomic data. In particular I will discuss insights from our analysis of diarrheal disease in developing countries, as well as the effective use of co-abundance approaches for linking together data from two large metagenomic studies.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"62 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82322182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-11-09DOI: 10.1109/BIBM.2015.7359644
P. Bourne
Biomedical research is becoming increasingly data driven, analytical and hence digital. In recognition of this evolution NIH has established the Office for Data Science with trans NIH responsibility for maximizing the value of this digital enterprise. This effort brings together communities, policy changes and new infrastructure to be applied to existing and new areas of research such as precision medicine. We will review these changes from the perspective of research advances that are underway and highlight how this community can further engage in these activities.
{"title":"Big data in biomedicine - An NIH perspective","authors":"P. Bourne","doi":"10.1109/BIBM.2015.7359644","DOIUrl":"https://doi.org/10.1109/BIBM.2015.7359644","url":null,"abstract":"Biomedical research is becoming increasingly data driven, analytical and hence digital. In recognition of this evolution NIH has established the Office for Data Science with trans NIH responsibility for maximizing the value of this digital enterprise. This effort brings together communities, policy changes and new infrastructure to be applied to existing and new areas of research such as precision medicine. We will review these changes from the perspective of research advances that are underway and highlight how this community can further engage in these activities.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"26 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88364952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-11-09DOI: 10.1109/BIBM.2015.7359792
Skoda Petr, Hoksza David
The screening of chemical libraries is an important step in identification of new leads in the drug discovery process. It is the size of the existing chemical libraries that renders laboratory screening expensive. A solution is to incorporate virtual screening into the process in order to reduce the number of molecules to be screened in the wet lab. In this paper, we explore several approaches to modification of one of the best performing methods for molecular representation in virtual screening campaigns, the topological torsion fingerprints. The modifications include the change of path length, altering atom descriptors and introduction of the so-called field version of the descriptors. With the field-based modification, our improved version of topological torsion fingerprints shows improvements by up to four percent in terms of area under the curve (AUC). The new topological torsion fingerprint thus represents one of the best performing molecular representation today.
{"title":"Exploration of topological torsion fingerprints","authors":"Skoda Petr, Hoksza David","doi":"10.1109/BIBM.2015.7359792","DOIUrl":"https://doi.org/10.1109/BIBM.2015.7359792","url":null,"abstract":"The screening of chemical libraries is an important step in identification of new leads in the drug discovery process. It is the size of the existing chemical libraries that renders laboratory screening expensive. A solution is to incorporate virtual screening into the process in order to reduce the number of molecules to be screened in the wet lab. In this paper, we explore several approaches to modification of one of the best performing methods for molecular representation in virtual screening campaigns, the topological torsion fingerprints. The modifications include the change of path length, altering atom descriptors and introduction of the so-called field version of the descriptors. With the field-based modification, our improved version of topological torsion fingerprints shows improvements by up to four percent in terms of area under the curve (AUC). The new topological torsion fingerprint thus represents one of the best performing molecular representation today.","PeriodicalId":73283,"journal":{"name":"IEEE International Conference on Bioinformatics and Biomedicine workshops. IEEE International Conference on Bioinformatics and Biomedicine","volume":"29 1","pages":"822-828"},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85122884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}