首页 > 最新文献

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine最新文献

英文 中文
Joint Learning of Representations of Medical Concepts and Words from EHR Data. 基于电子病历数据的医学概念和词语表示的联合学习。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217752
Tian Bai, Ashis Kumar Chanda, Brian L Egleston, Slobodan Vucetic

There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.

人们对从电子健康记录(EHRs)中学习医学概念的低维向量表示越来越感兴趣。虽然电子病历包含结构化数据,如诊断代码和实验室测试,但它们也包含非结构化的临床记录,提供有关患者健康状况的更细致的细节。在这项工作中,我们提出了一种医学概念和单词表征共同学习的方法。特别地,我们专注于通过使用一种新的word2vec模型学习方案来捕获医学代码和单词之间的关系。我们的方法利用了同一次访问中电子病历不同部分之间的关系,并将代码和单词嵌入到相同的连续向量空间中。最后,我们能够得出反映不同疾病和治疗模式的集群。在我们的实验中,我们定性地展示了我们为给定诊断代码分组单词的方法与主题建模方法的比较。我们还测试了我们的表征在预测下次就诊的疾病模式方面的效果。结果表明,我们的方法优于几种常用方法。
{"title":"Joint Learning of Representations of Medical Concepts and Words from EHR Data.","authors":"Tian Bai,&nbsp;Ashis Kumar Chanda,&nbsp;Brian L Egleston,&nbsp;Slobodan Vucetic","doi":"10.1109/BIBM.2017.8217752","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217752","url":null,"abstract":"<p><p>There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"764-769"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35772365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry. 一种基于谱图的蛋白质序列过滤算法用于自顶向下质谱法的蛋白质形态鉴定。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217653
Runmin Yang, Daming Zhu, Qiang Kou, Poomima Bhat-Nakshatri, Harikrishna Nakshatri, Si Wu, Xiaowen Liu

Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.

数据库搜索是自顶向下串联质谱鉴定变形的主要方法。然而,当产生谱的目标蛋白形式包含翻译后修饰和/或突变时,将查询谱与大型数据库中的所有蛋白质序列对齐是非常缓慢的。因此,高效灵敏的蛋白质序列过滤算法是加快数据库搜索速度的关键。在本文中,我们提出了一种新的过滤算法,该算法从查询谱的子谱生成谱图,并在蛋白质数据库中进行搜索以找到好的候选谱图。与序列标签和间隙标签方法相比,该方法避免了标签提取的步骤,从而简化了数据处理。实验结果表明,该方法在蛋白质序列过滤中具有较高的速度和灵敏度。
{"title":"A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry.","authors":"Runmin Yang,&nbsp;Daming Zhu,&nbsp;Qiang Kou,&nbsp;Poomima Bhat-Nakshatri,&nbsp;Harikrishna Nakshatri,&nbsp;Si Wu,&nbsp;Xiaowen Liu","doi":"10.1109/BIBM.2017.8217653","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217653","url":null,"abstract":"<p><p>Database search is the main approach for identifying proteoforms using top-down tandem mass spectra. However, it is extremely slow to align a query spectrum against all protein sequences in a large database when the target proteoform that produced the spectrum contains post-translational modifications and/or mutations. As a result, efficient and sensitive protein sequence filtering algorithms are essential for speeding up database search. In this paper, we propose a novel filtering algorithm, which generates spectrum graphs from subspectra of the query spectrum and searches them against the protein database to find good candidates. Compared with the sequence tag and gaped tag approaches, the proposed method circumvents the step of tag extraction, thus simplifying data processing. Experimental results on real data showed that the proposed method achieved both high speed and high sensitivity in protein sequence filtration.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"222-229"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35882194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VariFunNet, an integrated multiscale modeling framework to study the effects of rare non-coding variants in Genome-Wide Association Studies: applied to Alzheimer's Disease. VariFunNet,一个集成的多尺度建模框架,用于研究全基因组关联研究中罕见非编码变异的影响:应用于阿尔茨海默病。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217995
Qiao Liu, Chen Chen, Annie Gao, Hang Hang Tong, Lei Xie

It is a grand challenge to reveal the causal effects of DNA variants in complex phenotypes. Although statistical techniques can establish correlations between genotypes and phenotypes in Genome-Wide Association Studies (GWAS), they often fail when the variant is rare. The emerging Network-based Association Studies aim to address this shortcoming in statistical analysis, but are mainly applied to coding variations. Increasing evidences suggest that non-coding variants play critical roles in the etiology of complex diseases. However, few computational tools are available to study the effect of rare non-coding variants on phenotypes. Here we have developed a multiscale modeling variant-to-function-to-network framework VariFunNet to address these challenges. VariFunNet first predict the functional variations of molecular interactions, which result from the non-coding variants. Then we incorporate the genes associated with the functional variation into a tissue-specific gene network, and identify subnetworks that transmit the functional variation to molecular phenotypes. Finally, we quantify the functional implication of the subnetwork, and prioritize the association of the non-coding variants with the phenotype. We have applied VariFunNet to investigating the causal effect of rare non-coding variants on Alzheimer's disease (AD). Among top 21 ranked causal non-coding variants, 16 of them are directly supported by existing evidences. The remaining 5 novel variants dysregulate multiple downstream biological processes, all of which are associated with the pathology of AD. Furthermore, we propose potential new drug targets that may modulate diverse pathways responsible for AD. These findings may shed new light on discovering new biomarkers and therapies for the prevention, diagnosis, and treatment of AD. Our results suggest that multiscale modeling is a potentially powerful approach to studying causal genotype-phenotype associations.

揭示复杂表型中DNA变异的因果效应是一个巨大的挑战。尽管统计技术可以在全基因组关联研究(GWAS)中建立基因型和表型之间的相关性,但当变异很罕见时,它们往往失败。新兴的基于网络的关联研究旨在解决统计分析中的这一缺陷,但主要应用于编码变化。越来越多的证据表明,非编码变异在复杂疾病的病因学中起着关键作用。然而,很少有计算工具可用于研究罕见的非编码变异对表型的影响。在这里,我们开发了一个多尺度建模变体到功能到网络框架VariFunNet来解决这些挑战。VariFunNet首先预测了分子相互作用的功能变化,这是由非编码变异引起的。然后,我们将与功能变异相关的基因整合到组织特异性基因网络中,并确定将功能变异传递到分子表型的子网络。最后,我们量化了子网络的功能含义,并优先考虑了非编码变体与表型的关联。我们应用VariFunNet来研究罕见的非编码变异对阿尔茨海默病(AD)的因果影响。在排名前21位的因果非编码变异中,有16个是有直接证据支持的。剩下的5个新变异调节了多个下游生物过程,这些过程都与AD的病理有关。此外,我们提出了潜在的新药物靶点,可能调节负责阿尔茨海默病的多种途径。这些发现可能为发现新的生物标志物和预防、诊断和治疗阿尔茨海默病的治疗方法提供新的线索。我们的研究结果表明,多尺度建模是研究基因型-表型因果关系的潜在有效方法。
{"title":"VariFunNet, an integrated multiscale modeling framework to study the effects of rare non-coding variants in Genome-Wide Association Studies: applied to Alzheimer's Disease.","authors":"Qiao Liu,&nbsp;Chen Chen,&nbsp;Annie Gao,&nbsp;Hang Hang Tong,&nbsp;Lei Xie","doi":"10.1109/BIBM.2017.8217995","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217995","url":null,"abstract":"<p><p>It is a grand challenge to reveal the causal effects of DNA variants in complex phenotypes. Although statistical techniques can establish correlations between genotypes and phenotypes in Genome-Wide Association Studies (GWAS), they often fail when the variant is rare. The emerging Network-based Association Studies aim to address this shortcoming in statistical analysis, but are mainly applied to coding variations. Increasing evidences suggest that non-coding variants play critical roles in the etiology of complex diseases. However, few computational tools are available to study the effect of rare non-coding variants on phenotypes. Here we have developed a multiscale modeling variant-to-function-to-network framework VariFunNet to address these challenges. VariFunNet first predict the functional variations of molecular interactions, which result from the non-coding variants. Then we incorporate the genes associated with the functional variation into a tissue-specific gene network, and identify subnetworks that transmit the functional variation to molecular phenotypes. Finally, we quantify the functional implication of the subnetwork, and prioritize the association of the non-coding variants with the phenotype. We have applied VariFunNet to investigating the causal effect of rare non-coding variants on Alzheimer's disease (AD). Among top 21 ranked causal non-coding variants, 16 of them are directly supported by existing evidences. The remaining 5 novel variants dysregulate multiple downstream biological processes, all of which are associated with the pathology of AD. Furthermore, we propose potential new drug targets that may modulate diverse pathways responsible for AD. These findings may shed new light on discovering new biomarkers and therapies for the prevention, diagnosis, and treatment of AD. Our results suggest that multiscale modeling is a potentially powerful approach to studying causal genotype-phenotype associations.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"2177-2182"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36041552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines. 基于深度学习与浅层学习的 MSMS 光谱过滤器,支持蛋白质搜索引擎。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217824
Majdi Maabreh, Basheer Qolomany, James Springstead, Izzat Alsmadi, Ajay Gupta

Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.

尽管观测到的光谱数量与搜索时间呈线性关系,但目前的蛋白质搜索引擎,即使是并行版本,也需要数小时才能搜索到大量 MSMS 光谱,而这些光谱可以在短时间内生成。在费力的搜索过程之后,部分(有时是大部分)观察到的光谱会被标记为不可识别。我们评估了机器学习在构建高效 MSMS 过滤器以去除不可识别光谱中的作用。我们使用 9 种不同配置的浅层学习算法对深度学习算法进行了比较和评估。通过使用从两个不同搜索引擎、不同仪器、不同大小和不同物种生成的 10 个不同数据集,我们通过实验证明了深度学习模型在过滤 MSMS 图谱方面的强大功能。我们还表明,在其他浅层学习算法显示出令人鼓舞的 MSMS 图谱过滤结果的情况下,我们的简单特征列表具有重要意义。我们的深度学习模型可以排除约 50% 的不可识别光谱,而平均只损失 9% 的可识别光谱。在浅层学习方面,我们采用了以下算法随机森林算法、支持向量机算法和神经网络算法都取得了令人鼓舞的结果,平均排除了 70% 的不可识别光谱,同时损失了约 25% 的可识别光谱。深度学习算法在相关蛋白质的细胞或组织浓度较低的情况下可能尤其有用,而其他算法可能对浓度较高或表达较高的蛋白质更有用。
{"title":"Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines.","authors":"Majdi Maabreh, Basheer Qolomany, James Springstead, Izzat Alsmadi, Ajay Gupta","doi":"10.1109/BIBM.2017.8217824","DOIUrl":"10.1109/BIBM.2017.8217824","url":null,"abstract":"<p><p>Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1175-1182"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8370709/pdf/nihms-1728673.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39336546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Automatic Methods to Extract Patients' Supplement Use from Clinical Reports. 评估从临床报告中提取患者补品使用情况的自动方法。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217839
Yadan Fan, Lu He, Rui Zhang

The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of the supplement safety surveillance. In this study, we built rule-based and machine learning-based classifiers to automatically classify the use status of supplements into four categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). In comparison to the machine learning classifier trained on the same datasets, the rule-based classifier showed a better performance with F-measure in the C, D, S, U status of 0.93, 0.98, 0.95, and 0.83, respectively. We further analyzed the errors generated by the rule-based classifier. The classifier can be potentially applied to extract supplement information from clinical notes for supporting research and clinical practice related to patient safety on supplement usage.

膳食补充剂的广泛流行引起了人们的广泛关注,其安全性和有效性问题。临床记录记录了大量关于膳食补充剂使用的详细信息,为补充剂安全监测的临床研究提供了丰富的资料来源。确定膳食补充剂的使用状况是补充剂安全监测的最终目标的最初步骤之一。在本研究中,我们构建了基于规则和基于机器学习的分类器,将补充剂的使用状态自动分为四类:继续(C),停止(D),开始(S)和未分类(U)。与在相同数据集上训练的机器学习分类器相比,基于规则的分类器在C, D, S, U状态下的F-measure分别为0.93,0.98,0.95和0.83,显示出更好的性能。我们进一步分析了基于规则的分类器产生的错误。分类器可以潜在地应用于从临床记录中提取补充剂信息,以支持与患者使用补充剂安全相关的研究和临床实践。
{"title":"Evaluating Automatic Methods to Extract Patients' Supplement Use from Clinical Reports.","authors":"Yadan Fan,&nbsp;Lu He,&nbsp;Rui Zhang","doi":"10.1109/BIBM.2017.8217839","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217839","url":null,"abstract":"<p><p>The widespread prevalence of dietary supplements has drawn extensive attention due to the safety and efficacy issue. Clinical notes document a great amount of detailed information on dietary supplement usage, thus providing a rich source for clinical research on supplement safety surveillance. Identification the use status of dietary supplements is one of the initial steps for the ultimate goal of the supplement safety surveillance. In this study, we built rule-based and machine learning-based classifiers to automatically classify the use status of supplements into four categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). In comparison to the machine learning classifier trained on the same datasets, the rule-based classifier showed a better performance with F-measure in the C, D, S, U status of 0.93, 0.98, 0.95, and 0.83, respectively. We further analyzed the errors generated by the rule-based classifier. The classifier can be potentially applied to extract supplement information from clinical notes for supporting research and clinical practice related to patient safety on supplement usage.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1258-1261"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217839","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35714825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
OC-2-KB: A software pipeline to build an evidence-based obesity and cancer knowledge base. OC-2-KB:一个建立基于证据的肥胖和癌症知识库的软件管道。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217845
Juan Antonio Lossio-Ventura, William Hogan, François Modave, Yi Guo, Zhe He, Amanda Hicks, Jiang Bian

Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.

肥胖与几种癌症有关。获得适当的卫生信息可促使人们参与管理自己的健康,从而最终改善他们的健康结果。然而,现有的关于肥胖和癌症之间关系的在线信息是异构的,而且组织不健全。正式的知识表示可以帮助更好地组织和提供高质量的健康信息。目前,在生物医学领域有一些将非结构化数据转换为结构化数据并存储在语义Web知识库中的研究。在这篇演示论文中,我们介绍了OC-2-KB (Obesity and Cancer to Knowledge Base),这是一个专门用于指导自动知识库构建的系统,用于系统地管理自由文本科学文献(即PubMed摘要)中的肥胖和癌症知识。OC-2-KB有两个重要的模块,分别用于实体的获取和实体之间关系的提取和分类。我们在一个包含23篇人工标注的肥胖和癌症PubMed摘要的数据集上测试了OC-2-KB系统,并创建了一个包含765个三元组的初步KB。我们对该样本进行了初步评估,并报告了我们的评估结果。
{"title":"OC-2-KB: A software pipeline to build an evidence-based obesity and cancer knowledge base.","authors":"Juan Antonio Lossio-Ventura, William Hogan, François Modave, Yi Guo, Zhe He, Amanda Hicks, Jiang Bian","doi":"10.1109/BIBM.2017.8217845","DOIUrl":"10.1109/BIBM.2017.8217845","url":null,"abstract":"<p><p>Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1284-1287"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889048/pdf/nihms930742.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35986012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change-Point Detection for Monitoring Clinical Decision Support Systems with a Multi-Process Dynamic Linear Model. 用多过程动态线性模型监测临床决策支持系统的变化点检测。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217712
Siqi Liu, Adam Wright, Dean F Sittig, Milos Hauskrecht

A clinical decision support system and its components may malfunction due to different reasons. The objective of this work is to develop computational methods that can help us to monitor the system and assure its proper operation by promptly detecting and analyzing changes in its behavior. We develop a new change-point detection method using the Multi-Process Dynamic Linear Model. The experiments on real and simulated data show that our method outperforms existing change-point detection methods, leading to higher accuracy and shorter delay in the detection.

临床决策支持系统及其组成部分可能由于各种原因而发生故障。这项工作的目的是开发计算方法,可以帮助我们监测系统,并通过及时检测和分析其行为的变化来确保其正常运行。提出了一种基于多进程动态线性模型的变化点检测方法。在真实和模拟数据上的实验表明,该方法优于现有的变化点检测方法,具有更高的检测精度和更短的检测延迟。
{"title":"Change-Point Detection for Monitoring Clinical Decision Support Systems with a Multi-Process Dynamic Linear Model.","authors":"Siqi Liu,&nbsp;Adam Wright,&nbsp;Dean F Sittig,&nbsp;Milos Hauskrecht","doi":"10.1109/BIBM.2017.8217712","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217712","url":null,"abstract":"<p><p>A clinical decision support system and its components may malfunction due to different reasons. The objective of this work is to develop computational methods that can help us to monitor the system and assure its proper operation by promptly detecting and analyzing changes in its behavior. We develop a new change-point detection method using the Multi-Process Dynamic Linear Model. The experiments on real and simulated data show that our method outperforms existing change-point detection methods, leading to higher accuracy and shorter delay in the detection.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"569-572"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217712","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35710043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mining FDA resources to compute population-specific frequencies of adverse drug reactions. 挖掘FDA资源,计算药物不良反应的人群特异性频率。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217935
Aleksandar Poleksic, Carson Turner, Rishabh Dalal, Paul Gray, Lei Xie

Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.

药物不良反应(adr)是世界上主要的健康和经济问题之一。随着药品不良反应数据的增加,越来越需要能够以易于使用和理解的形式组织和存储药品不良反应相关信息的软件工具。在这里,我们提出了一个循序渐进的计算程序,能够从存储在联邦药物管理局数据库中的大量患者安全报告中提取药物不良反应频率数据。我们的程序是第一个能够产生特定人群药物不良反应频率的程序。我们的方法生成的药物不良反应数据可以针对单一患者群体(如性别或年龄)或单一治疗特征(如药物剂量、治疗持续时间)或这些的任何组合。
{"title":"Mining FDA resources to compute population-specific frequencies of adverse drug reactions.","authors":"Aleksandar Poleksic,&nbsp;Carson Turner,&nbsp;Rishabh Dalal,&nbsp;Paul Gray,&nbsp;Lei Xie","doi":"10.1109/BIBM.2017.8217935","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217935","url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) represent one of the main health and economic problems in the world. With increasing data on ADRs, there is an increased need for software tools capable of organizing and storing the information on drug-ADR associations in a form that is easy to use and understand. Here we present a step by step computational procedure capable of extracting drug-ADR frequency data from the large collection of patient safety reports stored in the Federal Drug Administration database. Our procedure is the first of its type capable of generating population specific drug-ADR frequencies. The drug-ADR data generated by our method can be made specific to a single patient population group (such as gender or age) or a single therapy characteristic (such as drug dosage, duration of therapy) or any combination of such.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1809-1814"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217935","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36471903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Methods to Extract New York Heart Association Classification from Clinical Notes. 从临床记录中自动提取纽约心脏协会分类的方法。
Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI: 10.1109/BIBM.2017.8217848
Rui Zhang, Sisi Ma, Liesa Shanahan, Jessica Munroe, Sarah Horn, Stuart Speedie

Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.

心脏再同步化治疗(CRT)是一种成熟的心脏起搏治疗方法。纽约心脏协会(NYHA)的分类常被用来衡量病人对CRT的反应。随着时间的推移,在电子健康记录(EHR)中一致地确定心衰患者的NYHA等级,可以更好地了解心衰的进展,并评估CRT的反应和有效性。然而,NYHA很少存储在电子病历结构化数据中,这些信息通常记录在非结构化的临床记录中。在这项研究中,我们因此研究了使用自然语言处理(NLP)方法从临床记录中识别NYHA分类。我们收集了6174份临床记录,这些记录与医院特定的定制NYHA分类诊断代码相匹配。基于机器学习的方法与基于规则的方法相似。最好的机器学习方法,具有n-gram特征的支持向量机,表现最好(93% F-measure)。需要进一步验证这些发现。
{"title":"Automatic Methods to Extract New York Heart Association Classification from Clinical Notes.","authors":"Rui Zhang,&nbsp;Sisi Ma,&nbsp;Liesa Shanahan,&nbsp;Jessica Munroe,&nbsp;Sarah Horn,&nbsp;Stuart Speedie","doi":"10.1109/BIBM.2017.8217848","DOIUrl":"https://doi.org/10.1109/BIBM.2017.8217848","url":null,"abstract":"<p><p>Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) classification is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. However, NYHA is rarely stored in EHR structured data such information is often documented in unstructured clinical notes. In this study, we thus investigated the use of natural language processing (NLP) methods to identify NYHA classification from clinical notes. We collected 6,174 clinical notes that were matched with hospital-specific custom NYHA class diagnosis codes. Machine-learning based methods performed similar with a rule-based method. The best machine-learning method, support vector machine with n-gram features, performed the best (93% F-measure). Further validation of the findings is required.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1296-1299"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2017.8217848","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36333041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection. 迈向肥胖-癌症知识库:生物医学实体识别与关系检测。
Pub Date : 2016-12-01 Epub Date: 2017-01-19 DOI: 10.1109/BIBM.2016.7822672
Juan Antonio Lossio-Ventura, William Hogan, François Modave, Amanda Hicks, Josh Hanna, Yi Guo, Zhe He, Jiang Bian

Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.

肥胖与患各种癌症的风险增加以及其他多种慢性疾病有关。另一方面,获取健康信息可以激活患者的参与,并改善他们的健康结果。然而,现有的关于肥胖及其与癌症关系的在线信息是异构的,从临床前模型和案例研究到仅仅基于假设的科学论点。正式的知识表示(即语义知识库)将有助于更好地组织和提供消费者所需的与肥胖和癌症相关的高质量健康信息。然而,目前描述肥胖、癌症和相关实体的本体并不能指导从异构信息源自动构建知识库。因此,在本文中,我们提出了命名实体识别(NER)方法,从学术文章中提取生物医学实体,并检测两个生物医学实体是否相关,其长期目标是建立一个肥胖-癌症知识库。我们在NER任务中利用语言和统计方法,取代了最先进的结果。此外,基于从句子中提取的统计特征,我们的关系检测方法的准确率为99.3%,f-measure为0.993。
{"title":"Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.","authors":"Juan Antonio Lossio-Ventura,&nbsp;William Hogan,&nbsp;François Modave,&nbsp;Amanda Hicks,&nbsp;Josh Hanna,&nbsp;Yi Guo,&nbsp;Zhe He,&nbsp;Jiang Bian","doi":"10.1109/BIBM.2016.7822672","DOIUrl":"https://doi.org/10.1109/BIBM.2016.7822672","url":null,"abstract":"<p><p>Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2016 ","pages":"1081-1088"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/BIBM.2016.7822672","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34993764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Proceedings. IEEE International Conference on Bioinformatics and Biomedicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1