Workshop on Biomedical Natural Language Processing最新文献

英文中文

Prediction of Protein Sub-cellular Localization using Information from Texts and Sequences. 利用文本和序列信息预测蛋白质亚细胞定位。

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572324

H. Chun, Chisato Yamasaki, Naomi Saichi, Masayuki Tanaka, T. Hishiki, T. Imanishi, T. Gojobori, Jin-Dong Kim, Junichi Tsujii, T. Takagi

This paper presents a novel prediction approach for protein sub-cellular localization. We have incorporated text and sequence-based approaches.

提出了一种新的蛋白质亚细胞定位预测方法。我们结合了文本和基于序列的方法。

引用次数: 0

Knowledge Sources for Word Sense Disambiguation of Biomedical Text 生物医学文本词义消歧的知识来源

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572321

Mark Stevenson, Yikun Guo, R. Gaizauskas, David Martínez

Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of a variety of knowledge sources including linguistic information (from the context in which the ambiguous term is used) and domain-specific resources (such as UMLS). In this paper we compare a range of knowledge sources which have been previously used and introduce a novel one: MeSH terms. The best performance is obtained using linguistic features in combination with MeSH terms. Results from our system outperform published results for previously reported systems on a standard test set (the NLM-WSD corpus).

与其他领域的文本一样，生物医学文档包含一系列具有多种可能含义的术语。这些歧义构成了生物医学文本自动处理的重大障碍。以前解决这个问题的方法利用了各种知识来源，包括语言信息(来自使用歧义术语的上下文中)和特定于领域的资源(例如UMLS)。在本文中，我们比较了一系列以前使用的知识来源，并引入了一种新的知识来源:MeSH术语。将语言特征与MeSH术语相结合，可以获得最佳的性能。我们系统的结果在标准测试集(NLM-WSD语料库)上优于先前报告的系统的公布结果。

引用次数: 20

A Pilot Annotation to Investigate Discourse Connectivity in Biomedical Text 生物医学语篇连通性研究的试点标注

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572325

Hong Yu, Nadya Frid, S. McRoy, R. Prasad, Alan Lee, A. Joshi

The goal of the Penn Discourse Treebank (PDTB) project is to develop a large-scale corpus, annotated with coherence relations marked by discourse connectives. Currently, the primary application of the PDTB annotation has been to news articles. In this study, we tested whether the PDTB guidelines can be adapted to a different genre. We annotated discourse connectives and their arguments in one 4,937-token full-text biomedical article. Two linguist annotators showed an agreement of 85% after simple conventions were added. For the remaining 15% cases, we found that biomedical domain-specific knowledge is needed to capture the linguistic cues that can be used to resolve inter-annotator disagreement. We found that the two annotators were able to reach an agreement after discussion. Thus our experiments suggest that the PDTB annotation can be adapted to new domains by minimally adjusting the guidelines and by adding some further domain-specific linguistic cues.

宾大语篇树库(PDTB)项目的目标是开发一个大规模的语料库，用语篇连接词标记连贯关系。目前，PDTB注释的主要应用是新闻文章。在这项研究中，我们测试了PDTB指南是否可以适用于不同的类型。我们在一篇4,937 token的生物医学全文文章中注释了话语连接词及其论点。两位语言学家的注释显示，在加入简单的约定后，一致性达到85%。对于其余15%的案例，我们发现需要生物医学领域特定知识来捕获可用于解决注释者之间分歧的语言线索。经过讨论，我们发现两位注释者能够达成一致。因此，我们的实验表明，PDTB注释可以通过最小限度地调整指南和添加一些进一步的特定于领域的语言线索来适应新的领域。

引用次数: 6

Conditional Random Fields and Support Vector Machines for Disorder Named Entity Recognition in Clinical Texts 临床文本中无序命名实体识别的条件随机场和支持向量机

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572326

Dingcheng Li, G. Savova, K. Schuler

We present a comparative study between two machine learning methods, Conditional Random Fields and Support Vector Machines for clinical named entity recognition. We explore their applicability to clinical domain. Evaluation against a set of gold standard named entities shows that CRFs outperform SVMs. The best F-score with CRFs is 0.86 and for the SVMs is 0.64 as compared to a baseline of 0.60.

我们提出了两种机器学习方法的比较研究，条件随机场和支持向量机用于临床命名实体识别。探讨其在临床领域的适用性。对一组黄金标准命名实体的评估表明，crf优于svm。与基线0.60相比，CRFs的最佳f值为0.86,svm的最佳f值为0.64。

引用次数: 106

CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem CBR-Tagger:基于案例的基因/蛋白质提及问题推理方法

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572333

M. Neves, M. Chagoyen, J. Carazo, A. Pascual-Montano

This work proposes a case-based classifier to tackle the gene/protein mention problem in biomedical literature. The so called gene mention problem consists of the recognition of gene and protein entities in scientific texts. A classification process aiming at deciding if a term is a gene mention or not is carried out for each word in the text. It is based on the selection of the best or most similar case in a base of known and unknown cases. The approach was evaluated on several datasets for different organisms and results show the suitability of this approach for the gene mention problem.

本文提出了一种基于案例的分类器来解决生物医学文献中的基因/蛋白质提及问题。所谓基因提及问题是指科学文本中基因和蛋白质实体的识别问题。对文本中的每个单词进行分类过程，目的是确定一个术语是否为基因提及。它是基于在已知和未知案例的基础上选择最佳或最相似的案例。在不同生物的多个数据集上对该方法进行了评估，结果表明该方法适用于基因提及问题。

引用次数: 10

Adaptive Information Extraction for Complex Biomedical Tasks 复杂生物医学任务的自适应信息提取

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572339

D. Feng, Gully A. Burns, E. Hovy

Biomedical information extraction tasks are often more complex and contain uncertainty at each step during problem solving processes. We present an adaptive information extraction framework and demonstrate how to explore uncertainty using feedback integration.

生物医学信息提取任务通常更复杂，并且在问题解决过程中的每一步都包含不确定性。我们提出了一个自适应信息提取框架，并演示了如何使用反馈集成来探索不确定性。

引用次数: 1

Extracting Clinical Relationships from Patient Narratives 从病人叙述中提取临床关系

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-19 DOI: 10.3115/1572306.1572309

A. Roberts, R. Gaizauskas, Mark Hepple

The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records, for clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to clinical relationships. We describe a supervised machine learning system, trained with a corpus of oncology narratives hand-annotated with clinically important relationships. Various shallow features are extracted from these texts, and used to train statistical classifiers. We compare the suitability of these features for clinical relationship extraction, how extraction varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.

临床电子科学框架(CLEF)项目建立了一个系统，从医疗记录的文本成分中提取临床重要信息，用于临床研究、循证医疗保健和基因型与表型相遇信息学。该系统的一部分是识别文本中临床重要实体之间的关系。该领域中关系抽取的典型方法使用了完整解析、特定于领域的语法和编码领域知识的大型知识库。在生物医学NLP的其他领域，统计机器学习方法现在通常应用于关系提取。我们报告了这些统计技术在临床关系中的新应用。我们描述了一个有监督的机器学习系统，该系统使用临床重要关系手工注释的肿瘤学叙述语料库进行训练。从这些文本中提取各种浅层特征，并用于训练统计分类器。我们比较了这些特征在临床关系提取中的适用性，句子间和句子内关系的提取是如何变化的，并检查了学习各种关系所需的训练数据量。

{"title":"Extracting Clinical Relationships from Patient Narratives","authors":"A. Roberts, R. Gaizauskas, Mark Hepple","doi":"10.3115/1572306.1572309","DOIUrl":"https://doi.org/10.3115/1572306.1572309","url":null,"abstract":"The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records, for clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to clinical relationships. \u0000 \u0000We describe a supervised machine learning system, trained with a corpus of oncology narratives hand-annotated with clinically important relationships. Various shallow features are extracted from these texts, and used to train statistical classifiers. We compare the suitability of these features for clinical relationship extraction, how extraction varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132998206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

Raising the Compatibility of Heterogeneous Annotations: A Case Study on 提高异构注释的兼容性:以

Workshop on Biomedical Natural Language Processing

Pub Date : 2008-06-01 DOI: 10.3115/1572306.1572338

Yue Wang, Kazuhiro Yoshida, Jin-Dong Kim, Rune Saetre, Junichi Tsujii

While there are several corpora which claim to have annotations for protein references, the heterogeneity between the annotations is recognized as an obstacle to develop expensive resources in a synergistic way. Here we present a series of experimental results which show the differences of protein mention annotations made to two corpora, GENIA and AImed.

虽然有几个语料库声称具有蛋白质参考的注释，但注释之间的异质性被认为是以协同方式开发昂贵资源的障碍。在此，我们提出了一系列实验结果，显示了对GENIA和aim两种语料库的蛋白质提及注释的差异。

引用次数: 0

ADEQA: A Question Answer based approach for joint ADE-Suspect Extraction using Sequence-To-Sequence Transformers ADEQA:一种基于问答的方法，用于使用序列到序列变压器的联合ade可疑提取

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.bionlp-1.17

Vinayak Arannil, Tomal Deb, Atanu Roy

Early identification of Adverse Drug Events (ADE) is critical for taking prompt actions while introducing new drugs into the market. These ADEs information are available through various unstructured data sources like clinical study reports, patient health records, social media posts, etc. Extracting ADEs and the related suspect drugs using machine learning is a challenging task due to the complex linguistic relations between drug ADE pairs in textual data and unavailability of large corpus of labelled datasets. This paper introduces ADEQA, a question- answer(QA) based approach using quasi supervised labelled data and sequence-to-sequence transformers to extract ADEs, drug suspects and the relationships between them. Unlike traditional QA models, natural language generation (NLG) based models don’t require extensive token level labelling and thereby reduces the adoption barrier significantly. On a public ADE corpus, we were able to achieve state-of-the-art results with an F1 score of 94% on establishing the relationships between ADEs and the respective suspects.

早期识别药物不良事件(ADE)对于在向市场推出新药时迅速采取行动至关重要。这些不良事件信息可通过各种非结构化数据源获得，如临床研究报告、患者健康记录、社交媒体帖子等。由于文本数据中药物ADE对之间复杂的语言关系以及标记数据集的大型语料库不可用，使用机器学习提取ADE和相关可疑药物是一项具有挑战性的任务。本文介绍了一种基于准监督标记数据和序列到序列转换器的问答方法来提取ade、毒品嫌疑人及其之间的关系。与传统的QA模型不同，基于自然语言生成(NLG)的模型不需要大量的令牌级别标记，因此显著降低了采用障碍。在公共ADE语料库上，我们能够获得最先进的结果，在建立ADE和各自嫌疑人之间的关系方面，F1得分为94%。

引用次数: 0

Automated Preamble Detection in Dictated Medical Reports 口述医学报告的自动前言检测

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W17-2336

Wael Salloum, Greg P. Finley, Erik Edwards, Mark Miller, David Suendermann-Oeft

Dictated medical reports very often feature a preamble containing metainformation about the report such as patient and physician names, location and name of the clinic, date of procedure, and so on. In the medical transcription process, the preamble is usually omitted from the final report, as it contains information already available in the electronic medical record. We present a method which is able to automatically identify preambles in medical dictations. The method makes use of stateof-the-art NLP techniques including word embeddings and Bi-LSTMs and achieves preamble detection performance superior to humans.

听写的医疗报告通常有一个包含报告元信息的序言，如病人和医生的姓名、诊所的位置和名称、手术日期等等。在医学转录过程中，序言部分通常从最终报告中省略，因为它包含电子病历中已有的信息。我们提出了一种能够自动识别医学口述前言的方法。该方法利用了最先进的自然语言处理技术，包括词嵌入和bi - lstm，实现了优于人类的序言检测性能。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Biomedical Natural Language Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀