首页 > 最新文献

Workshop on Biomedical Natural Language Processing最新文献

英文 中文
Global Locality in Biomedical Relation and Event Extraction 生物医学关系和事件提取的全局局部性
Pub Date : 2019-09-11 DOI: 10.18653/v1/2020.bionlp-1.21
Elaheh Shafieibavani, Antonio Jimeno-Yepes, Xu Zhong, David Martínez
Due to the exponential growth of biomedical literature, event and relation extraction are important tasks in biomedical text mining. Most work only focus on relation extraction, and detect a single entity pair mention on a short span of text, which is not ideal due to long sentences that appear in biomedical contexts. We propose an approach to both relation and event extraction, for simultaneously predicting relationships between all mention pairs in a text. We also perform an empirical study to discuss different network setups for this purpose. The best performing model includes a set of multi-head attentions and convolutions, an adaptation of the transformer architecture, which offers self-attention the ability to strengthen dependencies among related elements, and models the interaction between features extracted by multiple attention heads. Experiment results demonstrate that our approach outperforms the state of the art on a set of benchmark biomedical corpora including BioNLP 2009, 2011, 2013 and BioCreative 2017 shared tasks.
由于生物医学文献呈指数级增长,事件和关系提取是生物医学文本挖掘的重要任务。大多数工作只关注关系提取,并检测在短文本范围内提到的单个实体对,由于生物医学上下文中出现的长句子,这并不理想。我们提出了一种关系和事件提取的方法,用于同时预测文本中所有提及对之间的关系。我们还进行了一项实证研究来讨论不同的网络设置。表现最好的模型包括一组多头关注和卷积,一种自适应的变压器体系结构,它提供了自关注增强相关元素之间依赖关系的能力,并对多个关注头提取的特征之间的相互作用进行建模。实验结果表明,我们的方法在一组基准生物医学语料库(包括BioNLP 2009、2011、2013和BioCreative 2017共享任务)上的性能优于目前的技术水平。
{"title":"Global Locality in Biomedical Relation and Event Extraction","authors":"Elaheh Shafieibavani, Antonio Jimeno-Yepes, Xu Zhong, David Martínez","doi":"10.18653/v1/2020.bionlp-1.21","DOIUrl":"https://doi.org/10.18653/v1/2020.bionlp-1.21","url":null,"abstract":"Due to the exponential growth of biomedical literature, event and relation extraction are important tasks in biomedical text mining. Most work only focus on relation extraction, and detect a single entity pair mention on a short span of text, which is not ideal due to long sentences that appear in biomedical contexts. We propose an approach to both relation and event extraction, for simultaneously predicting relationships between all mention pairs in a text. We also perform an empirical study to discuss different network setups for this purpose. The best performing model includes a set of multi-head attentions and convolutions, an adaptation of the transformer architecture, which offers self-attention the ability to strengthen dependencies among related elements, and models the interaction between features extracted by multiple attention heads. Experiment results demonstrate that our approach outperforms the state of the art on a set of benchmark biomedical corpora including BioNLP 2009, 2011, 2013 and BioCreative 2017 shared tasks.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116421748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Toward Automated Early Sepsis Alerting: Identifying Infection Patients from Nursing Notes 迈向自动化早期败血症警报:从护理记录中识别感染患者
Pub Date : 2018-09-11 DOI: 10.18653/v1/W17-2332
Emilia Apostolova, Tom Velez
Severe sepsis and septic shock are conditions that affect millions of patients and have close to 50% mortality rate. Early identification of at-risk patients significantly improves outcomes. Electronic surveillance tools have been developed to monitor structured Electronic Medical Records and automatically recognize early signs of sepsis. However, many sepsis risk factors (e.g. symptoms and signs of infection) are often captured only in free text clinical notes. In this study, we developed a method for automatic monitoring of nursing notes for signs and symptoms of infection. We utilized a creative approach to automatically generate an annotated dataset. The dataset was used to create a Machine Learning model that achieved an F1-score ranging from 79 to 96%.
严重败血症和感染性休克影响数百万患者,死亡率接近50%。早期识别高危患者可显著改善预后。电子监测工具已经开发出来,用于监测结构化的电子医疗记录,并自动识别败血症的早期迹象。然而,许多脓毒症的危险因素(如感染的症状和体征)往往只在免费文本临床记录中被捕获。在这项研究中,我们开发了一种自动监测感染体征和症状的护理笔记的方法。我们使用了一种创造性的方法来自动生成带注释的数据集。该数据集被用来创建一个机器学习模型,该模型的f1得分范围从79到96%。
{"title":"Toward Automated Early Sepsis Alerting: Identifying Infection Patients from Nursing Notes","authors":"Emilia Apostolova, Tom Velez","doi":"10.18653/v1/W17-2332","DOIUrl":"https://doi.org/10.18653/v1/W17-2332","url":null,"abstract":"Severe sepsis and septic shock are conditions that affect millions of patients and have close to 50% mortality rate. Early identification of at-risk patients significantly improves outcomes. Electronic surveillance tools have been developed to monitor structured Electronic Medical Records and automatically recognize early signs of sepsis. However, many sepsis risk factors (e.g. symptoms and signs of infection) are often captured only in free text clinical notes. In this study, we developed a method for automatic monitoring of nursing notes for signs and symptoms of infection. We utilized a creative approach to automatically generate an annotated dataset. The dataset was used to create a Machine Learning model that achieved an F1-score ranging from 79 to 96%.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130333021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval 药物基因组信息检索中文献排序和查询优化的神经自编码器方法
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2310
Jonas Pfeiffer, Samuel Broscheit, Rainer Gemulla, Mathias Göschl
In this study, we investigate learning-to-rank and query refinement approaches for information retrieval in the pharmacogenomic domain. The goal is to improve the information retrieval process of biomedical curators, who manually build knowledge bases for personalized medicine. We study how to exploit the relationships between genes, variants, drugs, diseases and outcomes as features for document ranking and query refinement. For a supervised approach, we are faced with a small amount of annotated data and a large amount of unannotated data. Therefore, we explore ways to use a neural document auto-encoder in a semi-supervised approach. We show that a combination of established algorithms, feature-engineering and a neural auto-encoder model yield promising results in this setting.
在这项研究中,我们研究了药物基因组学领域信息检索的学习排序和查询改进方法。目标是改善生物医学管理员的信息检索过程,他们手动构建个性化医疗的知识库。我们研究如何利用基因、变异、药物、疾病和结果之间的关系作为文档排序和查询细化的特征。对于监督方法,我们面临着少量带注释的数据和大量未注释的数据。因此,我们探索在半监督方法中使用神经文档自动编码器的方法。我们表明,在这种情况下,已建立的算法,特征工程和神经自编码器模型的组合产生了有希望的结果。
{"title":"A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval","authors":"Jonas Pfeiffer, Samuel Broscheit, Rainer Gemulla, Mathias Göschl","doi":"10.18653/v1/W18-2310","DOIUrl":"https://doi.org/10.18653/v1/W18-2310","url":null,"abstract":"In this study, we investigate learning-to-rank and query refinement approaches for information retrieval in the pharmacogenomic domain. The goal is to improve the information retrieval process of biomedical curators, who manually build knowledge bases for personalized medicine. We study how to exploit the relationships between genes, variants, drugs, diseases and outcomes as features for document ranking and query refinement. For a supervised approach, we are faced with a small amount of annotated data and a large amount of unannotated data. Therefore, we explore ways to use a neural document auto-encoder in a semi-supervised approach. We show that a combination of established algorithms, feature-engineering and a neural auto-encoder model yield promising results in this setting.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114210988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Bacteria and Biotope Entity Recognition Using A Dictionary-Enhanced Neural Network Model 基于词典增强神经网络模型的细菌和生物群落实体识别
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2317
Qiuyue Wang, Xiaofeng Meng
Automatic recognition of biomedical entities in text is the crucial initial step in biomedical text mining. In this pa-per, we investigate employing modern neural network models for recognizing biomedical entities. To compensate for the small amount of training data in biomedical domain, we propose to integrate dictionaries into the neural model. Our experiments on BB3 data sets demonstrate that state-of-the-art neural network model is promising in recognizing biomedical entities even with very little training data. When integrated with dictionaries, its performance could be greatly improved, achieving the competitive performance compared with the best dictionary-based system on the entities with specific terminology, and much higher performance on the entities with more general terminology.
文本中生物医学实体的自动识别是生物医学文本挖掘的关键步骤。本文研究了利用现代神经网络模型识别生物医学实体的方法。为了弥补生物医学领域训练数据较少的不足,我们提出将字典集成到神经模型中。我们在BB3数据集上的实验表明,即使训练数据很少,最先进的神经网络模型在识别生物医学实体方面也很有前景。当与字典集成时,它的性能可以大大提高,在具有特定术语的实体上实现与最佳基于字典的系统相比的竞争性能,并且在具有更通用术语的实体上实现更高的性能。
{"title":"Bacteria and Biotope Entity Recognition Using A Dictionary-Enhanced Neural Network Model","authors":"Qiuyue Wang, Xiaofeng Meng","doi":"10.18653/v1/W18-2317","DOIUrl":"https://doi.org/10.18653/v1/W18-2317","url":null,"abstract":"Automatic recognition of biomedical entities in text is the crucial initial step in biomedical text mining. In this pa-per, we investigate employing modern neural network models for recognizing biomedical entities. To compensate for the small amount of training data in biomedical domain, we propose to integrate dictionaries into the neural model. Our experiments on BB3 data sets demonstrate that state-of-the-art neural network model is promising in recognizing biomedical entities even with very little training data. When integrated with dictionaries, its performance could be greatly improved, achieving the competitive performance compared with the best dictionary-based system on the entities with specific terminology, and much higher performance on the entities with more general terminology.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122511420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR 基于神经广义语言模型的医学IR复杂查询改写语义标注
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2313
Manirupa Das, E. Fosler-Lussier, Simon M. Lin, Soheil Moosavinasab, David Chen, S. Rust, Yungui Huang, R. Ramnath
In this work, we develop a novel, completely unsupervised, neural language model-based document ranking approach to semantic tagging of documents, using the document to be tagged as a query into the GLM to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embedding-based general language model due to Ganguly et al 2015, to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert–assigned concept tags for the queries, run on top of a standard Okapi BM25–based document retrieval system.
在这项工作中,我们开发了一种新颖的、完全无监督的、基于神经语言模型的文档排序方法来对文档进行语义标记,使用要标记的文档作为对GLM的查询,从排名靠前的相关文档中检索候选短语,从而将每个文档与从文本中提取的新颖相关概念相关联。为此,我们扩展了Ganguly等人2015年提出的基于词嵌入的通用语言模型,采用短语嵌入,并将由此获得的语义标签直接用于下游查询扩展,也可以在反馈回路设置中使用。使用TREC 2016临床决策支持挑战数据集对我们的方法进行了评估,结果显示,不仅在使用标准MeSH术语和UMLS概念进行查询扩展的各种基线上,而且在使用人类专家分配的概念标签进行查询的基线上,在基于标准Okapi bm25的文档检索系统上运行,在统计上有显著的改进。
{"title":"Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR","authors":"Manirupa Das, E. Fosler-Lussier, Simon M. Lin, Soheil Moosavinasab, David Chen, S. Rust, Yungui Huang, R. Ramnath","doi":"10.18653/v1/W18-2313","DOIUrl":"https://doi.org/10.18653/v1/W18-2313","url":null,"abstract":"In this work, we develop a novel, completely unsupervised, neural language model-based document ranking approach to semantic tagging of documents, using the document to be tagged as a query into the GLM to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embedding-based general language model due to Ganguly et al 2015, to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert–assigned concept tags for the queries, run on top of a standard Okapi BM25–based document retrieval system.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128174067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data 从中国临床病历中学习更好的嵌入:域内和域外数据结合的研究
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2323
Yaqiang Wang, Yunhui Chen, Hongping Shu, Yongguang Jiang
High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records. Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.
高质量的词嵌入对推进生物医学自然语言处理的应用具有重要意义。近年来,人们对如何基于英文医学文本学习好的嵌入和评价嵌入质量的研究越来越感兴趣,然而,基于中文医学文本,特别是中文临床记录的研究却非常有限。在此,我们提出了一种新的方法,在有限的中国临床记录的情况下,使用域外数据作为补充来提高学习嵌入的质量。在此基础上,提出了基于医学概念相似属性的嵌入质量评价方法。实验结果表明,选择好的训练样本、收集适量的域外数据以及在嵌入质量和训练耗时之间进行权衡是提高嵌入效果的关键因素。
{"title":"On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data","authors":"Yaqiang Wang, Yunhui Chen, Hongping Shu, Yongguang Jiang","doi":"10.18653/v1/W18-2323","DOIUrl":"https://doi.org/10.18653/v1/W18-2323","url":null,"abstract":"High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records. Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"48 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129292006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization 预训练生物医学词表示中的子词信息:评价与超参数优化
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2307
Dieter Galea, I. Laponogov, K. Veselkov
Word2vec embeddings are limited to computing vectors for in-vocabulary terms and do not take into account sub-word information. Character-based representations, such as fastText, mitigate such limitations. We optimize and compare these representations for the biomedical domain. fastText was found to consistently outperform word2vec in named entity recognition tasks for entities such as chemicals and genes. This is likely due to gained information from computed out-of-vocabulary term vectors, as well as the word compositionality of such entities. Contrastingly, performance varied on intrinsic datasets. Optimal hyper-parameters were intrinsic dataset-dependent, likely due to differences in term types distributions. This indicates embeddings should be chosen based on the task at hand. We therefore provide a number of optimized hyper-parameter sets and pre-trained word2vec and fastText models, available on https://github.com/dterg/bionlp-embed.
Word2vec嵌入仅限于计算词汇表内术语的向量,而不考虑子词信息。基于字符的表示(如fastText)减轻了这种限制。我们对生物医学领域的这些表示进行了优化和比较。研究发现,在化学物质和基因等实体的命名实体识别任务中,fastText的表现一直优于word2vec。这可能是由于从计算的词汇表外术语向量中获得的信息,以及这些实体的单词组合性。相比之下,性能在内部数据集上有所不同。最优超参数是固有的数据集依赖,可能是由于术语类型分布的差异。这表明应该根据手头的任务来选择嵌入。因此,我们提供了许多优化的超参数集和预训练的word2vec和fastText模型,可在https://github.com/dterg/bionlp-embed上获得。
{"title":"Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization","authors":"Dieter Galea, I. Laponogov, K. Veselkov","doi":"10.18653/v1/W18-2307","DOIUrl":"https://doi.org/10.18653/v1/W18-2307","url":null,"abstract":"Word2vec embeddings are limited to computing vectors for in-vocabulary terms and do not take into account sub-word information. Character-based representations, such as fastText, mitigate such limitations. We optimize and compare these representations for the biomedical domain. fastText was found to consistently outperform word2vec in named entity recognition tasks for entities such as chemicals and genes. This is likely due to gained information from computed out-of-vocabulary term vectors, as well as the word compositionality of such entities. Contrastingly, performance varied on intrinsic datasets. Optimal hyper-parameters were intrinsic dataset-dependent, likely due to differences in term types distributions. This indicates embeddings should be chosen based on the task at hand. We therefore provide a number of optimized hyper-parameter sets and pre-trained word2vec and fastText models, available on https://github.com/dterg/bionlp-embed.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Coding Structures and Actions with the COSTA Scheme in Medical Conversations 医学会话中COSTA方案的编码结构和动作
Pub Date : 2018-07-01 DOI: 10.18653/v1/W18-2309
Nan Wang, Yan Song, Fei Xia
This paper describes the COSTA scheme for coding structures and actions in conversation. Informed by Conversation Analysis, the scheme introduces an innovative method for marking multi-layer structural organization of conversation and a structure-informed taxonomy of actions. In addition, we create a corpus of naturally occurring medical conversations, containing 318 video-recorded and manually transcribed pediatric consultations. Based on the annotated corpus, we investigate 1) treatment decision-making process in medical conversations, and 2) effects of physician-caregiver communication behaviors on antibiotic over-prescribing. Although the COSTA annotation scheme is developed based on data from the task-specific domain of pediatric consultations, it can be easily extended to apply to more general domains and other languages.
本文描述了用于对话中编码结构和动作的COSTA方案。该方案以会话分析为基础,引入了一种标记多层会话结构组织的创新方法和一种结构通知的动作分类法。此外,我们还创建了一个自然发生的医学对话语料库,其中包含318个视频记录和手动转录的儿科咨询。基于标注语料库,我们研究了1)医疗对话中的治疗决策过程,以及2)医患沟通行为对抗生素过度处方的影响。尽管COSTA注释方案是基于儿科咨询的特定任务领域的数据开发的,但它可以很容易地扩展到更一般的领域和其他语言。
{"title":"Coding Structures and Actions with the COSTA Scheme in Medical Conversations","authors":"Nan Wang, Yan Song, Fei Xia","doi":"10.18653/v1/W18-2309","DOIUrl":"https://doi.org/10.18653/v1/W18-2309","url":null,"abstract":"This paper describes the COSTA scheme for coding structures and actions in conversation. Informed by Conversation Analysis, the scheme introduces an innovative method for marking multi-layer structural organization of conversation and a structure-informed taxonomy of actions. In addition, we create a corpus of naturally occurring medical conversations, containing 318 video-recorded and manually transcribed pediatric consultations. Based on the annotated corpus, we investigate 1) treatment decision-making process in medical conversations, and 2) effects of physician-caregiver communication behaviors on antibiotic over-prescribing. Although the COSTA annotation scheme is developed based on data from the task-specific domain of pediatric consultations, it can be easily extended to apply to more general domains and other languages.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131432485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Representation of complex terms in a vector space structured by an ontology for a normalization task 复项在归一化任务的本体结构的向量空间中的表示
Pub Date : 2017-08-04 DOI: 10.18653/v1/W17-2312
Arnaud Ferré, Pierre Zweigenbaum, C. Nédellec
We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects.
本文提出了一种用领域本体概念标注文本术语的半监督方法。该方法在本体结构的语义空间中生成复杂术语的连续向量表示。所提出的方法依赖于一种分布式语义方法,该方法为每个提取的项生成初始向量。然后将这些向量嵌入到由本体结构构造的向量空间中。这种嵌入是通过训练一个线性模型来实现的。最后,我们应用距离计算来确定术语向量和概念向量之间的接近度,从而为术语分配本体标签。通过使用本体的概念作为语义标签,我们已经评估了规范化任务的这些表示的质量。术语的规范化是提取文本中包含的部分信息的重要步骤,但生成的向量空间可能会找到其他应用。该方法的性能可与目前标准化任务的性能相媲美,开辟了令人鼓舞的前景。
{"title":"Representation of complex terms in a vector space structured by an ontology for a normalization task","authors":"Arnaud Ferré, Pierre Zweigenbaum, C. Nédellec","doi":"10.18653/v1/W17-2312","DOIUrl":"https://doi.org/10.18653/v1/W17-2312","url":null,"abstract":"We propose in this paper a semi-supervised method for labeling terms of texts with concepts of a domain ontology. The method generates continuous vector representations of complex terms in a semantic space structured by the ontology. The proposed method relies on a distributional semantics approach, which generates initial vectors for each of the extracted terms. Then these vectors are embedded in the vector space constructed from the structure of the ontology. This embedding is carried out by training a linear model. Finally, we apply a distance calculation to determine the proximity between vectors of terms and vectors of concepts and thus to assign ontology labels to terms. We have evaluated the quality of these representations for a normalization task by using the concepts of an ontology as semantic labels. Normalization of terms is an important step to extract a part of the information containing in texts, but the vector space generated might find other applications. The performance of this method is comparable to that of the state of the art for this task of standardization, opening up encouraging prospects.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"2451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130952497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Initializing neural networks for hierarchical multi-label text classification 用于分层多标签文本分类的神经网络初始化
Pub Date : 2017-08-04 DOI: 10.18653/v1/W17-2339
Simon Baker, A. Korhonen
Many tasks in the biomedical domain require the assignment of one or more predefined labels to input text, where the labels are a part of a hierarchical structure (such as a taxonomy). The conventional approach is to use a one-vs.-rest (OVR) classification setup, where a binary classifier is trained for each label in the taxonomy or ontology where all instances not belonging to the class are considered negative examples. The main drawbacks to this approach are that dependencies between classes are not leveraged in the training and classification process, and the additional computational cost of training parallel classifiers. In this paper, we apply a new method for hierarchical multi-label text classification that initializes a neural network model final hidden layer such that it leverages label co-occurrence relations such as hypernymy. This approach elegantly lends itself to hierarchical classification. We evaluated this approach using two hierarchical multi-label text classification tasks in the biomedical domain using both sentence- and document-level classification. Our evaluation shows promising results for this approach.
生物医学领域的许多任务需要分配一个或多个预定义的标签来输入文本,其中标签是层次结构(例如分类法)的一部分。传统的方法是使用一对一。-rest (OVR)分类设置,其中为分类法或本体中的每个标签训练一个二元分类器,其中所有不属于该类的实例都被视为负例。这种方法的主要缺点是在训练和分类过程中没有利用类之间的依赖关系,以及训练并行分类器的额外计算成本。在本文中,我们应用了一种分层多标签文本分类的新方法,该方法初始化了神经网络模型的最终隐藏层,从而利用了标签共现关系(如超音)。这种方法非常适合分层分类。我们使用生物医学领域的两个分层多标签文本分类任务来评估这种方法,该任务使用句子级和文档级分类。我们的评估显示了这种方法的良好结果。
{"title":"Initializing neural networks for hierarchical multi-label text classification","authors":"Simon Baker, A. Korhonen","doi":"10.18653/v1/W17-2339","DOIUrl":"https://doi.org/10.18653/v1/W17-2339","url":null,"abstract":"Many tasks in the biomedical domain require the assignment of one or more predefined labels to input text, where the labels are a part of a hierarchical structure (such as a taxonomy). The conventional approach is to use a one-vs.-rest (OVR) classification setup, where a binary classifier is trained for each label in the taxonomy or ontology where all instances not belonging to the class are considered negative examples. The main drawbacks to this approach are that dependencies between classes are not leveraged in the training and classification process, and the additional computational cost of training parallel classifiers. In this paper, we apply a new method for hierarchical multi-label text classification that initializes a neural network model final hidden layer such that it leverages label co-occurrence relations such as hypernymy. This approach elegantly lends itself to hierarchical classification. We evaluated this approach using two hierarchical multi-label text classification tasks in the biomedical domain using both sentence- and document-level classification. Our evaluation shows promising results for this approach.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128004851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
期刊
Workshop on Biomedical Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1