Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217935
Aleksandar Poleksic, Carson Turner, Rishabh Dalal, Paul Gray, Lei Xie
Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.
{"title":"Mining FDA resources to compute population-specific frequencies of adverse drug reactions.","authors":"Aleksandar Poleksic, Carson Turner, Rishabh Dalal, Paul Gray, Lei Xie","doi":"10.1109/BIBM.2017.8217935","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217935","url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1809-1814"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217935","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36471903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217848
Rui Zhang, Sisi Ma, Liesa Shanahan, Jessica Munroe, Sarah Horn, Stuart Speedie
Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.
{"title":"Automatic Methods to Extract New York Heart Association Classification from Clinical Notes.","authors":"Rui Zhang, Sisi Ma, Liesa Shanahan, Jessica Munroe, Sarah Horn, Stuart Speedie","doi":"10.1109/BIBM.2017.8217848","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217848","url":null,"abstract":"<p><p>Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1296-1299"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217848","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36333041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822672
Juan Antonio Lossio-Ventura, William Hogan, François Modave, Amanda Hicks, Josh Hanna, Yi Guo, Zhe He, Jiang Bian
Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.
{"title":"Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.","authors":"Juan Antonio Lossio-Ventura, William Hogan, François Modave, Amanda Hicks, Josh Hanna, Yi Guo, Zhe He, Jiang Bian","doi":"10.1109/BIBM.2016.7822672","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822672","url":null,"abstract":"<p><p>Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1081-1088"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822672","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34993764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822775
N Clement, M Rasheed, C Bajaj
Most of the existing research in assembly pathway prediction/analysis of viral capsids makes the simplifying assumption that the configuration of the intermediate states can be extracted directly from the final configuration of the entire capsid. This assumption does not take into account the conformational changes of the constituent proteins as well as minor changes to the binding interfaces that continue throughout the assembly process until stabilization. This paper presents a statistical-ensemble based approach which samples the configurational space for each monomer with the relative local orientation between monomers, to capture the uncertainties in binding and conformations. Furthermore, instead of using larger capsomers (trimers, pentamers) as building blocks, we allow all possible subassemblies to bind in all possible combinations. We represent the resulting assembly graph in two different ways: First, we use the Wilcoxon signed rank measure to compare the distributions of binding free energy computed on the sampled conformations to predict likely pathways. Second, we represent chemical equilibrium aspects of the transitions as a Bayesian Factor graph where both associations and dissociations are modeled based on concentrations and the binding free energies. We applied these protocols on the feline panleukopenia virus and the Nudaurelia capensis virus. Results from these experiments showed significant departure from those one would obtain if only the static configurations of the proteins were considered. Hence, we establish the importance of an uncertainty-aware protocol for pathway analysis, and provide a statistical framework as an important first step towards assembly pathway prediction with high statistical confidence.
{"title":"Uncertainty Quantified Computational Analysis of the Energetics of Virus Capsid Assembly.","authors":"N Clement, M Rasheed, C Bajaj","doi":"10.1109/BIBM.2016.7822775","DOIUrl":"10.1109/BIBM.2016.7822775","url":null,"abstract":"<p><p>Most of the existing research in assembly pathway prediction/analysis of viral capsids makes the simplifying assumption that the configuration of the intermediate states can be extracted directly from the final configuration of the entire capsid. This assumption does not take into account the conformational changes of the constituent proteins as well as minor changes to the binding interfaces that continue throughout the assembly process until stabilization. This paper presents a statistical-ensemble based approach which samples the configurational space for each monomer with the relative local orientation between monomers, to capture the uncertainties in binding and conformations. Furthermore, instead of using larger capsomers (trimers, pentamers) as building blocks, we allow all possible subassemblies to bind in all possible combinations. We represent the resulting assembly graph in two different ways: First, we use the Wilcoxon signed rank measure to compare the distributions of binding free energy computed on the sampled conformations to predict likely pathways. Second, we represent chemical equilibrium aspects of the transitions as a Bayesian Factor graph where both associations and dissociations are modeled based on concentrations and the binding free energies. We applied these protocols on the feline panleukopenia virus and the <i>Nudaurelia capensis</i> virus. Results from these experiments showed significant departure from those one would obtain if only the static configurations of the proteins were considered. Hence, we establish the importance of an uncertainty-aware protocol for pathway analysis, and provide a statistical framework as an important first step towards assembly pathway prediction with high statistical confidence.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1706-1713"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604467/pdf/nihms894982.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35431193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822706
Yongxin Chen, Jung Hun Oh, Romeil Sandhu, Sangkyu Lee, Joseph O Deasy, Allen Tannenbaum
More than half of all cancer patients receive radiotherapy in their treatment process. However, our understanding of abnormal transcriptional responses to radiation remains poor. In this study, we employ an extended definition of Ollivier-Ricci curvature based on LI-Wasserstein distance to investigate genes and biological processes associated with ionizing radiation (IR) and ultraviolet radiation (UV) exposure using a microarray dataset. Gene expression levels were modeled on a gene interaction topology downloaded from the Human Protein Reference Database (HPRD). This was performed for IR, UV, and mock datasets, separately. The difference curvature value between IR and mock graphs (also between UV and mock) for each gene was used as a metric to estimate the extent to which the gene responds to radiation. We found that in comparison of the top 200 genes identified from IR and UV graphs, about 20~30% genes were overlapping. Through gene ontology enrichment analysis, we found that the metabolic-related biological process was highly associated with both IR and UV radiation exposure.
{"title":"Transcriptional Responses to Ultraviolet and Ionizing Radiation: An Approach Based on Graph Curvature.","authors":"Yongxin Chen, Jung Hun Oh, Romeil Sandhu, Sangkyu Lee, Joseph O Deasy, Allen Tannenbaum","doi":"10.1109/BIBM.2016.7822706","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822706","url":null,"abstract":"<p><p>More than half of all cancer patients receive radiotherapy in their treatment process. However, our understanding of abnormal transcriptional responses to radiation remains poor. In this study, we employ an extended definition of Ollivier-Ricci curvature based on LI-Wasserstein distance to investigate genes and biological processes associated with ionizing radiation (IR) and ultraviolet radiation (UV) exposure using a microarray dataset. Gene expression levels were modeled on a gene interaction topology downloaded from the Human Protein Reference Database (HPRD). This was performed for IR, UV, and mock datasets, separately. The difference curvature value between IR and mock graphs (also between UV and mock) for each gene was used as a metric to estimate the extent to which the gene responds to radiation. We found that in comparison of the top 200 genes identified from IR and UV graphs, about 20~30% genes were overlapping. Through gene ontology enrichment analysis, we found that the metabolic-related biological process was highly associated with both IR and UV radiation exposure.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1302-1306"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822706","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34784321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822668
Yadan Fan, Lu He, Rui Zhang
Clinical notes contain rich information about dietary supplements, which are critical for detecting signals of dietary supplement side effects and interactions between drugs and supplements. One of the important factors of supplement documentation is usage status, such as started and discontinuation. Such information is usually stored in the unstructured clinical notes. We developed a rule-based classifier to identify supplement usage status in clinical notes. The categories referring to the patient's status of supplement use were classified into four classes: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). Clinical notes containing 10 of the most commonly consumed supplements (i.e., alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John's Wort, and Vitamin E) were retrieved from the University of Minnesota Clinical Data Repository. The gold standard was defined by manually annotating 1000 randomly selected sentences or statements mentioning at least one of these 10 supplements. The rules in the classifier was initially developed on two-thirds of the set of 7 supplements (i.e., alfalfa, garlic, ginger, ginkgo, ginseng, St. John's Wort, and Vitamin E); the performance was evaluated on the remaining one-third of this set. To evaluate the generalizability of rules, we further validated the second testing set on other 3 supplements (i.e., echinacea, fish oil, and melatonin). The performance of the classifier achieved F-measures of 0.95, 0.97, 0.96, and 0.96 for status C, D, S, and U on 7 supplements, respectively. The classifier also showed good generalizability when it was applied to the other 3 supplements with F-measures of 0.96 for C, 0.96 for D, 0.95 for S, and 0.89 for U. This study demonstrated that the classifier can accurately classify supplement usage status, which can be further integrated as a module into the existing natural language processing pipeline for supporting dietary supplement knowledge discovery.
{"title":"Classification of Use Status for Dietary Supplements in Clinical Notes.","authors":"Yadan Fan, Lu He, Rui Zhang","doi":"10.1109/BIBM.2016.7822668","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822668","url":null,"abstract":"<p><p>Clinical notes contain rich information about dietary supplements, which are critical for detecting signals of dietary supplement side effects and interactions between drugs and supplements. One of the important factors of supplement documentation is usage status, such as started and discontinuation. Such information is usually stored in the unstructured clinical notes. We developed a rule-based classifier to identify supplement usage status in clinical notes. The categories referring to the patient's status of supplement use were classified into four classes: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). Clinical notes containing 10 of the most commonly consumed supplements (i.e., alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John's Wort, and Vitamin E) were retrieved from the University of Minnesota Clinical Data Repository. The gold standard was defined by manually annotating 1000 randomly selected sentences or statements mentioning at least one of these 10 supplements. The rules in the classifier was initially developed on two-thirds of the set of 7 supplements (i.e., alfalfa, garlic, ginger, ginkgo, ginseng, St. John's Wort, and Vitamin E); the performance was evaluated on the remaining one-third of this set. To evaluate the generalizability of rules, we further validated the second testing set on other 3 supplements (i.e., echinacea, fish oil, and melatonin). The performance of the classifier achieved F-measures of 0.95, 0.97, 0.96, and 0.96 for status C, D, S, and U on 7 supplements, respectively. The classifier also showed good generalizability when it was applied to the other 3 supplements with F-measures of 0.96 for C, 0.96 for D, 0.95 for S, and 0.89 for U. This study demonstrated that the classifier can accurately classify supplement usage status, which can be further integrated as a module into the existing natural language processing pipeline for supporting dietary supplement knowledge discovery.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1054-1061"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822668","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35428398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/bibm.2016.7822515
Hamid Reza Hassanzadeh, May D Wang
Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.
转录因子(TF)是与 DNA 启动子的顺式调节特定子区域结合并启动转录的大分子。找到这些结合位点(又称图案)的确切位置对药物设计和开发等多个领域都很重要。为了满足这一需求,迄今已开发出多种体内和体外技术,试图描述和预测蛋白质与不同 DNA 位点结合的特异性。这些技术的主要问题在于,它们在预测结合亲和力和表征相应基团方面不够准确。因此,需要进行下游分析才能发现相关蛋白质的结合位置。在此,我们提出了 DeeperBind,这是一种用于预测蛋白质与 DNA 探针结合特异性的长短期递归卷积网络。DeeperBind 可以对探针序列的位置动态进行建模,从而有效地计算 DNA 序列中各个子区域的贡献。此外,它还可以在包含不同长度序列的数据集上进行训练和测试。蛋白质结合微阵列是一种用于量化蛋白质-DNA 结合偏好的体外高通量技术。据我们所知,这是通过利用深度学习在特征生成和位置动力学建模方面的强大功能,从高通量技术产生的数据中预测 DNA 序列结合特异性的最准确的管道。
{"title":"DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins.","authors":"Hamid Reza Hassanzadeh, May D Wang","doi":"10.1109/bibm.2016.7822515","DOIUrl":"10.1109/bibm.2016.7822515","url":null,"abstract":"<p><p>Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"178-183"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302108/pdf/nihms-1595286.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38060153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822490
Rongjian Li, Dong Si, Tao Zeng, Shuiwang Ji, Jing He
The detection of secondary structure of proteins using three dimensional (3D) cryo-electron microscopy (cryo-EM) images is still a challenging task when the spatial resolution of cryo-EM images is at medium level (5-10Å ). Prior researches focused on the usage of local features that may not capture the global information of image objects. In this study, we propose to use deep learning methods to extract high representative global features and then automatically detect secondary structures of proteins. In particular, we build a convolutional neural network (CNN) classifier that predicts the probability of label for every individual voxel in 3D cryo-EM image with respect to the secondary structure elements of proteins such as α-helix, β-sheet and background. To effectively incorporate the 3D spatial information in protein structures, we propose to perform 3D convolutions in the convolutional layers of CNNs. We show that the proposed CNN classifier can outperform existing SVM method on identifying the secondary structure elements of proteins from 3D cryo-EM medium resolution images.
{"title":"Deep Convolutional Neural Networks for Detecting Secondary Structures in Protein Density Maps from Cryo-Electron Microscopy.","authors":"Rongjian Li, Dong Si, Tao Zeng, Shuiwang Ji, Jing He","doi":"10.1109/BIBM.2016.7822490","DOIUrl":"10.1109/BIBM.2016.7822490","url":null,"abstract":"<p><p>The detection of secondary structure of proteins using three dimensional (3D) cryo-electron microscopy (cryo-EM) images is still a challenging task when the spatial resolution of cryo-EM images is at medium level (5-10Å ). Prior researches focused on the usage of local features that may not capture the global information of image objects. In this study, we propose to use deep learning methods to extract high representative global features and then automatically detect secondary structures of proteins. In particular, we build a convolutional neural network (CNN) classifier that predicts the probability of label for every individual voxel in 3D cryo-EM image with respect to the secondary structure elements of proteins such as <i>α</i>-helix, <i>β</i>-sheet and background. To effectively incorporate the 3D spatial information in protein structures, we propose to perform 3D convolutions in the convolutional layers of CNNs. We show that the proposed CNN classifier can outperform existing SVM method on identifying the secondary structure elements of proteins from 3D cryo-EM medium resolution images.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"41-46"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952046/pdf/nihms874389.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36106213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/bibm.2016.7822516
Hamid Reza Hassanzadeh, John H Phan, May D Wang
Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.
{"title":"A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival.","authors":"Hamid Reza Hassanzadeh, John H Phan, May D Wang","doi":"10.1109/bibm.2016.7822516","DOIUrl":"https://doi.org/10.1109/bibm.2016.7822516","url":null,"abstract":"<p><p>Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"184-189"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm.2016.7822516","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38151657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822607
Zhe He, Zhiwei Chen, Jiang Bian
Clinical studies, especially randomized controlled trials, generate gold-standard medical evidence. However, the lack of population representativeness of clinical studies has hampered their generalizability to the real-world population. Overly restrictive qualitative criteria are often applied to exclude patients. In this work, we develop a lexical-pattern-based tool to structure qualitative eligibility criteria with temporal constraints, with which we analyzed over 10,800 cancer clinical studies. Our results showed that restrictive temporal constraints are often applied on qualitative criteria in cancer studies, limiting the generalizability of their results.
{"title":"Analysis of Temporal Constraints in Qualitative Eligibility Criteria of Cancer Clinical Studies.","authors":"Zhe He, Zhiwei Chen, Jiang Bian","doi":"10.1109/BIBM.2016.7822607","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822607","url":null,"abstract":"<p><p>Clinical studies, especially randomized controlled trials, generate gold-standard medical evidence. However, the lack of population representativeness of clinical studies has hampered their generalizability to the real-world population. Overly restrictive qualitative criteria are often applied to exclude patients. In this work, we develop a lexical-pattern-based tool to structure qualitative eligibility criteria with temporal constraints, with which we analyzed over 10,800 cancer clinical studies. Our results showed that restrictive temporal constraints are often applied on qualitative criteria in cancer studies, limiting the generalizability of their results.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"717-722"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822607","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35676265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}