Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217752
Tian Bai, Ashis Kumar Chanda, Brian L Egleston, Slobodan Vucetic
There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.
{"title":"Joint Learning of Representations of Medical Concepts and Words from EHR Data.","authors":"Tian Bai, Ashis Kumar Chanda, Brian L Egleston, Slobodan Vucetic","doi":"10.1109/BIBM.2017.8217752","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217752","url":null,"abstract":"<p><p>There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"764-769"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35772365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217653
Runmin Yang, Daming Zhu, Qiang Kou, Poomima Bhat-Nakshatri, Harikrishna Nakshatri, Si Wu, Xiaowen Liu
Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.
{"title":"A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry.","authors":"Runmin Yang, Daming Zhu, Qiang Kou, Poomima Bhat-Nakshatri, Harikrishna Nakshatri, Si Wu, Xiaowen Liu","doi":"10.1109/BIBM.2017.8217653","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217653","url":null,"abstract":"<p><p>Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"222-229"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35882194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217995
Qiao Liu, Chen Chen, Annie Gao, Hang Hang Tong, Lei Xie
It is a grand challenge to reveal the causal effects of DNA variants in complex phenotypes. Although statistical techniques can establish correlations between genotypes and phenotypes in Genome-Wide Association Studies (GWAS), they often fail when the variant is rare. The emerging Network-based Association Studies aim to address this shortcoming in statistical analysis, but are mainly applied to coding variations. Increasing evidences suggest that non-coding variants play critical roles in the etiology of complex diseases. However, few computational tools are available to study the effect of rare non-coding variants on phenotypes. Here we have developed a multiscale modeling variant-to-function-to-network framework VariFunNet to address these challenges. VariFunNet first predict the functional variations of molecular interactions, which result from the non-coding variants. Then we incorporate the genes associated with the functional variation into a tissue-specific gene network, and identify subnetworks that transmit the functional variation to molecular phenotypes. Finally, we quantify the functional implication of the subnetwork, and prioritize the association of the non-coding variants with the phenotype. We have applied VariFunNet to investigating the causal effect of rare non-coding variants on Alzheimer's disease (AD). Among top 21 ranked causal non-coding variants, 16 of them are directly supported by existing evidences. The remaining 5 novel variants dysregulate multiple downstream biological processes, all of which are associated with the pathology of AD. Furthermore, we propose potential new drug targets that may modulate diverse pathways responsible for AD. These findings may shed new light on discovering new biomarkers and therapies for the prevention, diagnosis, and treatment of AD. Our results suggest that multiscale modeling is a potentially powerful approach to studying causal genotype-phenotype associations.
{"title":"VariFunNet, an integrated multiscale modeling framework to study the effects of rare non-coding variants in Genome-Wide Association Studies: applied to Alzheimer's Disease.","authors":"Qiao Liu, Chen Chen, Annie Gao, Hang Hang Tong, Lei Xie","doi":"10.1109/BIBM.2017.8217995","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217995","url":null,"abstract":"<p><p>It is a grand challenge to reveal the causal effects of DNA variants in complex phenotypes. Although statistical techniques can establish correlations between genotypes and phenotypes in Genome-Wide Association Studies (GWAS), they often fail when the variant is rare. The emerging Network-based Association Studies aim to address this shortcoming in statistical analysis, but are mainly applied to coding variations. Increasing evidences suggest that non-coding variants play critical roles in the etiology of complex diseases. However, few computational tools are available to study the effect of rare non-coding variants on phenotypes. Here we have developed a multiscale modeling variant-to-function-to-network framework VariFunNet to address these challenges. VariFunNet first predict the functional variations of molecular interactions, which result from the non-coding variants. Then we incorporate the genes associated with the functional variation into a tissue-specific gene network, and identify subnetworks that transmit the functional variation to molecular phenotypes. Finally, we quantify the functional implication of the subnetwork, and prioritize the association of the non-coding variants with the phenotype. We have applied VariFunNet to investigating the causal effect of rare non-coding variants on Alzheimer's disease (AD). Among top 21 ranked causal non-coding variants, 16 of them are directly supported by existing evidences. The remaining 5 novel variants dysregulate multiple downstream biological processes, all of which are associated with the pathology of AD. Furthermore, we propose potential new drug targets that may modulate diverse pathways responsible for AD. These findings may shed new light on discovering new biomarkers and therapies for the prevention, diagnosis, and treatment of AD. Our results suggest that multiscale modeling is a potentially powerful approach to studying causal genotype-phenotype associations.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"2177-2182"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36041552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217824
Majdi Maabreh, Basheer Qolomany, James Springstead, Izzat Alsmadi, Ajay Gupta
Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.
{"title":"Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines.","authors":"Majdi Maabreh, Basheer Qolomany, James Springstead, Izzat Alsmadi, Ajay Gupta","doi":"10.1109/BIBM.2017.8217824","DOIUrl":"10.1109/BIBM.2017.8217824","url":null,"abstract":"<p><p>Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1175-1182"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8370709/pdf/nihms-1728673.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39336546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217839
Yadan Fan, Lu He, Rui Zhang
The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of the supplement safety surveillance. In this study, we built rule-based and machine learning-based classifiers to automatically classify the use status of supplements into four categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). In comparison to the machine learning classifier trained on the same datasets, the rule-based classifier showed a better performance with F-measure in the C, D, S, U status of 0.93, 0.98, 0.95, and 0.83, respectively. We further analyzed the errors generated by the rule-based classifier. The classifier can be potentially applied to extract supplement information from clinical notes for supporting research and clinical practice related to patient safety on supplement usage.
{"title":"Evaluating Automatic Methods to Extract Patients' Supplement Use from Clinical Reports.","authors":"Yadan Fan, Lu He, Rui Zhang","doi":"10.1109/BIBM.2017.8217839","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217839","url":null,"abstract":"<p><p>The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of the supplement safety surveillance. In this study, we built rule-based and machine learning-based classifiers to automatically classify the use status of supplements into four categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). In comparison to the machine learning classifier trained on the same datasets, the rule-based classifier showed a better performance with F-measure in the C, D, S, U status of 0.93, 0.98, 0.95, and 0.83, respectively. We further analyzed the errors generated by the rule-based classifier. The classifier can be potentially applied to extract supplement information from clinical notes for supporting research and clinical practice related to patient safety on supplement usage.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1258-1261"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217839","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35714825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217845
Juan Antonio Lossio-Ventura, William Hogan, François Modave, Yi Guo, Zhe He, Amanda Hicks, Jiang Bian
Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.
肥胖与几种癌症有关。获得适当的卫生信息可促使人们参与管理自己的健康,从而最终改善他们的健康结果。然而,现有的关于肥胖和癌症之间关系的在线信息是异构的,而且组织不健全。正式的知识表示可以帮助更好地组织和提供高质量的健康信息。目前,在生物医学领域有一些将非结构化数据转换为结构化数据并存储在语义Web知识库中的研究。在这篇演示论文中,我们介绍了OC-2-KB (Obesity and Cancer to Knowledge Base),这是一个专门用于指导自动知识库构建的系统,用于系统地管理自由文本科学文献(即PubMed摘要)中的肥胖和癌症知识。OC-2-KB有两个重要的模块,分别用于实体的获取和实体之间关系的提取和分类。我们在一个包含23篇人工标注的肥胖和癌症PubMed摘要的数据集上测试了OC-2-KB系统,并创建了一个包含765个三元组的初步KB。我们对该样本进行了初步评估,并报告了我们的评估结果。
{"title":"OC-2-KB: A software pipeline to build an evidence-based obesity and cancer knowledge base.","authors":"Juan Antonio Lossio-Ventura, William Hogan, François Modave, Yi Guo, Zhe He, Amanda Hicks, Jiang Bian","doi":"10.1109/BIBM.2017.8217845","DOIUrl":"10.1109/BIBM.2017.8217845","url":null,"abstract":"<p><p>Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1284-1287"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889048/pdf/nihms930742.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35986012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217712
Siqi Liu, Adam Wright, Dean F Sittig, Milos Hauskrecht
A clinical decision support system and its components may malfunction due to different reasons. The objective of this work is to develop computational methods that can help us to monitor the system and assure its proper operation by promptly detecting and analyzing changes in its behavior. We develop a new change-point detection method using the Multi-Process Dynamic Linear Model. The experiments on real and simulated data show that our method outperforms existing change-point detection methods, leading to higher accuracy and shorter delay in the detection.
{"title":"Change-Point Detection for Monitoring Clinical Decision Support Systems with a Multi-Process Dynamic Linear Model.","authors":"Siqi Liu, Adam Wright, Dean F Sittig, Milos Hauskrecht","doi":"10.1109/BIBM.2017.8217712","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217712","url":null,"abstract":"<p><p>A clinical decision support system and its components may malfunction due to different reasons. The objective of this work is to develop computational methods that can help us to monitor the system and assure its proper operation by promptly detecting and analyzing changes in its behavior. We develop a new change-point detection method using the Multi-Process Dynamic Linear Model. The experiments on real and simulated data show that our method outperforms existing change-point detection methods, leading to higher accuracy and shorter delay in the detection.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"569-572"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217712","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35710043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217935
Aleksandar Poleksic, Carson Turner, Rishabh Dalal, Paul Gray, Lei Xie
Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.
{"title":"Mining FDA resources to compute population-specific frequencies of adverse drug reactions.","authors":"Aleksandar Poleksic, Carson Turner, Rishabh Dalal, Paul Gray, Lei Xie","doi":"10.1109/BIBM.2017.8217935","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217935","url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1809-1814"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217935","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36471903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01Epub Date: 2017-12-18DOI: 10.1109/BIBM.2017.8217848
Rui Zhang, Sisi Ma, Liesa Shanahan, Jessica Munroe, Sarah Horn, Stuart Speedie
Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.
{"title":"Automatic Methods to Extract New York Heart Association Classification from Clinical Notes.","authors":"Rui Zhang, Sisi Ma, Liesa Shanahan, Jessica Munroe, Sarah Horn, Stuart Speedie","doi":"10.1109/BIBM.2017.8217848","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217848","url":null,"abstract":"<p><p>Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1296-1299"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217848","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36333041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01Epub Date: 2017-01-19DOI: 10.1109/BIBM.2016.7822672
Juan Antonio Lossio-Ventura, William Hogan, François Modave, Amanda Hicks, Josh Hanna, Yi Guo, Zhe He, Jiang Bian
Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.
{"title":"Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.","authors":"Juan Antonio Lossio-Ventura, William Hogan, François Modave, Amanda Hicks, Josh Hanna, Yi Guo, Zhe He, Jiang Bian","doi":"10.1109/BIBM.2016.7822672","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822672","url":null,"abstract":"<p><p>Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1081-1088"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822672","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34993764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}